jm + zookeeper   20

How to do distributed locking
A critique of the "Redlock" locking algorithm from Redis by Martin Kleppman. antirez responds here: http://antirez.com/news/101 ; summary of followups: https://storify.com/martinkl/redlock-discussion
distributed  locking  redis  algorithms  coding  distcomp  redlock  martin-kleppman  zookeeper 
february 2016 by jm
Holistic Configuration Management at Facebook
How FB push config changes from Git (where it is code reviewed, version controlled, and history tracked with strong auth) to Zeus (their Zookeeper fork) and from there to live production servers.
facebook  configuration  zookeeper  git  ops  architecture 
october 2015 by jm
librato/disco-java
Librato's service discovery library using Zookeeper (so strongly consistent, but with the ZK downside that an AZ outage can stall service discovery updates region-wide)
zookeeper  service-discovery  librato  java  open-source  load-balancing 
october 2015 by jm
Please stop calling databases CP or AP
In his excellent blog post [...] Jeff Hodges recommends that you use the CAP theorem to critique systems. A lot of people have taken that advice to heart, describing their systems as “CP” (consistent but not available under network partitions), “AP” (available but not consistent under network partitions), or sometimes “CA” (meaning “I still haven’t read Coda’s post from almost 5 years ago”).

I agree with all of Jeff’s other points, but with regard to the CAP theorem, I must disagree. The CAP theorem is too simplistic and too widely misunderstood to be of much use for characterizing systems. Therefore I ask that we retire all references to the CAP theorem, stop talking about the CAP theorem, and put the poor thing to rest. Instead, we should use more precise terminology to reason about our trade-offs.
cap  databases  storage  distcomp  ca  ap  cp  zookeeper  consistency  reliability  networking 
may 2015 by jm
The Discovery of Apache ZooKeeper's Poison Packet - PagerDuty
Excellent deep dive into a production issue. Root causes: crappy error handling code in Zookeeper; lack of bounds checking in ZK; and a nasty kernel bug.
zookeeper  bugs  error-handling  bounds-checking  oom  poison-packets  pagerduty  packets  tcpdump  xen  aes  linux  kernel 
may 2015 by jm
Pinterest's highly-available configuration service
Stored on S3, update notifications pushed to clients via Zookeeper
s3  zookeeper  ha  pinterest  config  storage 
march 2015 by jm
how Curator fixed issues with the Hive ZooKeeper Lock Manager Implementation
Ugh, ZK is a bear to work with.
Apache Curator is open source software which is able to handle all of the above scenarios transparently. Curator is a Netflix ZooKeeper Library and it provides a high-level API, CuratorFramework, that simplifies using ZooKeeper. By using a singleton CuratorFramework instance in the new ZooKeeperHiveLockManager implementation, we not only fixed the ZooKeeper connection issues, but also made the code easy to understand and maintain.  
zookeeper  apis  curator  netflix  distributed-locks  coding  hive 
february 2015 by jm
Why You Shouldn’t Use ZooKeeper for Service Discovery
In CAP terms, ZooKeeper is CP, meaning that it’s consistent in the face of partitions, not available. For many things that ZooKeeper does, this is a necessary trade-off. Since ZooKeeper is first and foremost a coordination service, having an eventually consistent design (being AP) would be a horrible design decision. Its core consensus algorithm, Zab, is therefore all about consistency. For coordination, that’s great. But for service discovery it’s better to have information that may contain falsehoods than to have no information at all. It is much better to know what servers were available for a given service five minutes ago than to have no idea what things looked like due to a transient network partition. The guarantees that ZooKeeper makes for coordination are the wrong ones for service discovery, and it hurts you to have them.


Yes! I've been saying this for months -- good to see others concurring.
architecture  zookeeper  eureka  outages  network-partitions  service-discovery  cap  partitions 
december 2014 by jm
Zookeeper: not so great as a highly-available service registry
Turns out ZK isn't a good choice as a service discovery system, if you want to be able to use that service discovery system while partitioned from the rest of the ZK cluster:
I went into one of the instances and quickly did an iptables DROP on all packets coming from the other two instances.  This would simulate an availability zone continuing to function, but that zone losing network connectivity to the other availability zones.  What I saw was that the two other instances noticed the first server “going away”, but they continued to function as they still saw a majority (66%).  More interestingly the first instance noticed the other two servers “going away”, dropping the ensemble availability to 33%.  This caused the first server to stop serving requests to clients (not only writes, but also reads).


So: within that offline AZ, service discovery *reads* (as well as writes) stopped working due to a lack of ZK quorum. This is quite a feasible outage scenario for EC2, by the way, since (at least when I was working there) the network links between AZs, and the links with the external internet, were not 100% overlapping.

In other words, if you want a highly-available service discovery system in the fact of network partitions, you want an AP service discovery system, rather than a CP one -- and ZK is a CP system.

Another risk, noted on the Netflix Eureka mailing list at https://groups.google.com/d/msg/eureka_netflix/LXKWoD14RFY/tA9UnerrBHUJ :

ZooKeeper, while tolerant against single node failures, doesn't react well to long partitioning events. For us, it's vastly more important that we maintain an available registry than a necessarily consistent registry. If us-east-1d sees 23 nodes, and us-east-1c sees 22 nodes for a little bit, that's OK with us.


I guess this means that a long partition can trigger SESSION_EXPIRED state, resulting in ZK client libraries requiring a restart/reconnect to fix. I'm not entirely clear what happens to the ZK cluster itself in this scenario though.

Finally, Pinterest ran into other issues relying on ZK for service discovery and registration, described at http://engineering.pinterest.com/post/77933733851/zookeeper-resilience-at-pinterest ; sounds like this was mainly around load and the "thundering herd" overload problem. Their workaround was to decouple ZK availability from their services' availability, by building a Smartstack-style sidecar daemon on each host which tracked/cached ZK data.
zookeeper  service-discovery  ops  ha  cap  ap  cp  service-registry  availability  ec2  aws  network  partitions  eureka  smartstack  pinterest 
november 2014 by jm
Building a Global, Highly Available Service Discovery Infrastructure with ZooKeeper
This is the written version of a presentation [Camille Fournier] made at the ZooKeeper Users Meetup at Strata/Hadoop World in October, 2012 (slides available here). This writeup expects some knowledge of ZooKeeper.


good advice from one of the ZK committers.
zookeeper  service-discovery  architecture  distcomp  camille-fournier  availability  wan  network 
may 2014 by jm
ZooKeeper Resilience at Pinterest
essentially decoupling the client services from ZK using a local daemon on each client host; very similar to Airbnb's Smartstack. This is a bit of an indictment of ZK's usability though
ops  architecture  clustering  network  partitions  cap  reliability  smartstack  airbnb  pinterest  zookeeper 
march 2014 by jm
Answer to How many topics (queues) can be created in Apache Kafka? - Quora
Good to know:
'As far as I understand (this was true as of 2013, when I last looked into this issue) there's at least one Apache ZooKeeper znode per topic in Kafka. While there is no hard limitation in Kafka itself (Kafka is linearly scalable), it does mean that the maximum number of znodes comfortable supported by ZooKeeper (on the order of about ten thousand) is the upper limit of Kafka's scalability as far as the number of topics goes.'
kafka  queues  zookeeper  znodes  architecture 
march 2014 by jm
Apache Curator
Netflix open-source library to make using ZooKeeper from Java less of a PITA. I really wish I'd used this now, having reimplemented some key parts of it after failures in prod ;)
zookeeper  netflix  apache  curator  java  libraries  open-source 
january 2014 by jm
Replicant: Replicated State Machines Made Easy
The next time you reach for ZooKeeper, ask yourself whether it provides the primitive you really need. If ZooKeeper's filesystem and znode abstractions truly meet your needs, great. But the odds are, you'll be better off writing your application as a replicated state machine.
zookeeper  paxos  replicant  replication  consensus  state-machines  distcomp 
december 2013 by jm
etcd
A highly-available key value store for shared configuration and service discovery. etcd is inspired by zookeeper and doozer, with a focus on:

Simple: curl'able user facing API (HTTP+JSON);
Secure: optional SSL client cert authentication;
Fast: benchmarked 1000s of writes/s per instance;
Reliable: Properly distributed using Raft;

Etcd is written in go and uses the raft consensus algorithm to manage a highly availably replicated log.

One of the core components of CoreOS -- http://coreos.com/ .
configuration  distributed  raft  ha  doozer  zookeeper  go  replication  consensus-algorithm  etcd  coreos 
august 2013 by jm
Netflix Curator
a high-level API that greatly simplifies using ZooKeeper. It adds many features that build on ZooKeeper and handles the complexity of managing connections to the ZooKeeper cluster and retrying operations. Some of the features are:

Automatic connection management: There are potential error cases that require ZooKeeper clients to recreate a connection and/or retry operations. Curator automatically and transparently (mostly) handles these cases.

Cleaner API: simplifies the raw ZooKeeper methods, events, etc.; provides a modern, fluent interface

Recipe implementations (see Recipes): Leader election, Shared lock, Path cache and watcher, Distributed Queue, Distributed Priority Queue
zookeeper  java  netflix  distcomp  libraries  oss  open-source  distributed 
march 2013 by jm
Monitoring Apache Hadoop, Cassandra and Zookeeper using Graphite and JMXTrans
nice enough, but a lot of moving parts. It would be nice to see a simpler ZK+Graphite setup using the 'mntr' verb
graphite  monitoring  ops  zookeeper  cassandra  hadoop  jmx  jmxtrans  graphs 
march 2013 by jm
Building an Impenetrable ZooKeeper (PDF)
great presentation on operational tips for a reliable ZK cluster (via Bill deHora)
via:bill-dehora  zookeeper  ops  syadmin 
november 2012 by jm
Autometrics: Self-service metrics collection
how LinkedIn built a service-metrics collection and graphing infrastructure using Kafka and Zookeeper, writing to RRD files, handling 8.8k metrics per datacenter per second
kafka  zookeeper  linkedin  sysadmin  service-metrics 
february 2012 by jm

Copy this bookmark:



description:


tags: