jm + ap   9

Wifi AP Placement [video]
'AP Placement - A Job For the Work Experience Kid? | Scott Stapleton | WLPC EU Budapest 2016'
ap  wifi  placement  layout  ops  wireless  home  presos 
2 days ago by jm
Don't Settle For Eventual Consistency
Quite an argument. Not sure I agree, but worth a bookmark anyway...
With an AP system, you are giving up consistency, and not really gaining anything in terms of effective availability, the type of availability you really care about.  Some might think you can regain strong consistency in an AP system by using strict quorums (where the number of nodes written + number of nodes read > number of replicas).  Cassandra calls this “tunable consistency”.  However, Kleppmann has shown that even with strict quorums, inconsistencies can result.10  So when choosing (algorithmic) availability over consistency, you are giving up consistency for not much in return, as well as gaining complexity in your clients when they have to deal with inconsistencies.
cap-theorem  databases  storage  cap  consistency  cp  ap  eventual-consistency 
22 days ago by jm
iKydz
'Total Parent Control' for kids internet access at home. Dublin-based product, dedicated wifi AP with lots of child-oriented filtering capabilities
filtering  security  ikydz  kids  children  internet  wifi  ap  hardware  blocking 
10 weeks ago by jm
Please stop calling databases CP or AP
In his excellent blog post [...] Jeff Hodges recommends that you use the CAP theorem to critique systems. A lot of people have taken that advice to heart, describing their systems as “CP” (consistent but not available under network partitions), “AP” (available but not consistent under network partitions), or sometimes “CA” (meaning “I still haven’t read Coda’s post from almost 5 years ago”).

I agree with all of Jeff’s other points, but with regard to the CAP theorem, I must disagree. The CAP theorem is too simplistic and too widely misunderstood to be of much use for characterizing systems. Therefore I ask that we retire all references to the CAP theorem, stop talking about the CAP theorem, and put the poor thing to rest. Instead, we should use more precise terminology to reason about our trade-offs.
cap  databases  storage  distcomp  ca  ap  cp  zookeeper  consistency  reliability  networking 
may 2015 by jm
Great quote from Voldemort author Jay Kreps
"Reading papers: essential. Slavishly implementing ideas you read: not necessarily a good idea. Trust me, I wrote an Amazon Dynamo clone."

Later in the discussion, on complex conflict resolution logic (as used in Dynamo, Voldemort, and Riak):

"I reviewed 200 Voldemort stores, 190 used default lww conflict resolution. 10 had custom logic, all 10 of which had bugs." -- https://twitter.com/jaykreps/statuses/528292617784537088

(although IMO I'd prefer complex resolution to non-availability, when AP is required)
voldemort  jay-kreps  dynamo  cap-theorem  ap  riak  papers  lww  conflict-resolution  distcomp 
november 2014 by jm
Zookeeper: not so great as a highly-available service registry
Turns out ZK isn't a good choice as a service discovery system, if you want to be able to use that service discovery system while partitioned from the rest of the ZK cluster:
I went into one of the instances and quickly did an iptables DROP on all packets coming from the other two instances.  This would simulate an availability zone continuing to function, but that zone losing network connectivity to the other availability zones.  What I saw was that the two other instances noticed the first server “going away”, but they continued to function as they still saw a majority (66%).  More interestingly the first instance noticed the other two servers “going away”, dropping the ensemble availability to 33%.  This caused the first server to stop serving requests to clients (not only writes, but also reads).


So: within that offline AZ, service discovery *reads* (as well as writes) stopped working due to a lack of ZK quorum. This is quite a feasible outage scenario for EC2, by the way, since (at least when I was working there) the network links between AZs, and the links with the external internet, were not 100% overlapping.

In other words, if you want a highly-available service discovery system in the fact of network partitions, you want an AP service discovery system, rather than a CP one -- and ZK is a CP system.

Another risk, noted on the Netflix Eureka mailing list at https://groups.google.com/d/msg/eureka_netflix/LXKWoD14RFY/tA9UnerrBHUJ :

ZooKeeper, while tolerant against single node failures, doesn't react well to long partitioning events. For us, it's vastly more important that we maintain an available registry than a necessarily consistent registry. If us-east-1d sees 23 nodes, and us-east-1c sees 22 nodes for a little bit, that's OK with us.


I guess this means that a long partition can trigger SESSION_EXPIRED state, resulting in ZK client libraries requiring a restart/reconnect to fix. I'm not entirely clear what happens to the ZK cluster itself in this scenario though.

Finally, Pinterest ran into other issues relying on ZK for service discovery and registration, described at http://engineering.pinterest.com/post/77933733851/zookeeper-resilience-at-pinterest ; sounds like this was mainly around load and the "thundering herd" overload problem. Their workaround was to decouple ZK availability from their services' availability, by building a Smartstack-style sidecar daemon on each host which tracked/cached ZK data.
zookeeper  service-discovery  ops  ha  cap  ap  cp  service-registry  availability  ec2  aws  network  partitions  eureka  smartstack  pinterest 
november 2014 by jm
RICON 2014: CRDTs
Carlos Baquero presents several operation, state-based CRDTs for use in AP systems like Voldemort and Riak
ap  cap-theorem  crdts  ricon  carlos-baquero  data-structures  distcomp 
october 2014 by jm
Kelly "kellabyte" Sommers on Redis' "relaxed CP" approach to the CAP theorem

Similar to ACID properties, if you partially provide properties it means the user has to _still_ consider in their application that the property doesn't exist, because sometimes it doesn't. In you're fsync example, if fsync is relaxed and there are no replicas, you cannot consider the database durable, just like you can't consider Redis a CP system. It can't be counted on for guarantees to be delivered. This is why I say these systems are hard for users to reason about. Systems that partially offer guarantees require in-depth knowledge of the nuances to properly use the tool. Systems that explicitly make the trade-offs in the designs are easier to reason about because it is more obvious and _predictable_.
kellabyte  redis  cp  ap  cap-theorem  consistency  outages  reliability  ops  database  storage  distcomp 
december 2013 by jm

Copy this bookmark:



description:


tags: