distcomp   275

« earlier    

(92) The Paxos Algorithm - YouTube
"A Google TechTalk, 2/2/18, presented by Luis Quesada Torres. ABSTRACT: This Tech Talk presents the Paxos algorithm and discusses a fictional distributed storage system (i.e. simplified Megastore) based on Paxos."
paxos  algorithms  distcomp 
8 weeks ago by arsyed
Marc Brooker on leases
Good advice from Marc Brooker on using leases as a way to handle leader election in a distributed system: 'Leases are a nice primitive because they are easy to understand, easy (if subtle) to implement correctly, require very little co-ordination, optimistic, and don't require much load on the strongly consistent service.'
leases  primitives  distributed-systems  distcomp  networking  coding  marc-brooker  algorithms 
february 2019 by jm
_Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes_, SIGMOD '18
One of the more novel differences between Aurora and other relational databases is how it pushes redo processing to a multi-tenant scale-out storage service, purpose-built for Aurora. Doing so reduces networking traffic, avoids checkpoints and crash recovery, enables failovers to replicas without loss of data, and enables fault-tolerant storage that heals without database involvement. Traditional implementations that leverage distributed storage would use distributed consensus algorithms for commits, reads, replication, and membership changes and amplify cost of underlying storage. In this paper, we describe how Aurora avoids distributed consensus under most circumstances by establishing invariants and leveraging local transient state. Doing so improves performance, reduces variability, and lowers costs.
papers  toread  aurora  amazon  aws  pdf  scalability  distcomp  state  sql  mysql  postgresql  distributed-consensus 
january 2019 by jm
Netflix/Hystrix
"Hystrix is a latency and fault tolerance library designed to isolate points of access to remote systems, services and 3rd party libraries, stop cascading failure and enable resilience in complex distributed systems where failure is inevitable."
java  libs  distcomp  fault-tolerance  latency  netflix 
august 2018 by arsyed
Basho investor to pay up $20m in damages for campaign that put biz on 'greased slide to failure' • The Register
This is disappointing. Basho was very promising.

An investment fund and its manager have been ordered to pay up $20.3m after "misinformation, threats and combative behaviour" helped put NoSQL database biz Basho on a "greased slide to failure".

As reported by The Register, the once-promising biz, which developed the Riak distributed database, faded away last year amid severe criticisms of the way its major investor, Georgetown Capital Partners, operated.

These centred around the control the investment firm and boss Chester Davenport gained over Basho, and how that power was used to block other funders and push out dissenting voices, with the hope of selling the company off fast.
basho  distcomp  riak  vc  software  silicon-valley 
july 2018 by jm
A Decade of Dynamo: Powering the next wave of high-performance, internet-scale applications - All Things Distributed
A deep dive on how we were using our existing databases revealed that they were frequently not used for their relational capabilities. About 70 percent of operations were of the key-value kind, where only a primary key was used and a single row would be returned. About 20 percent would return a set of rows, but still operate on only a single table.

With these requirements in mind, and a willingness to question the status quo, a small group of distributed systems experts came together and designed a horizontally scalable distributed database that would scale out for both reads and writes to meet the long-term needs of our business. This was the genesis of the Amazon Dynamo database.

The success of our early results with the Dynamo database encouraged us to write Amazon's Dynamo whitepaper and share it at the 2007 ACM Symposium on Operating Systems Principles (SOSP conference), so that others in the industry could benefit. The Dynamo paper was well-received and served as a catalyst to create the category of distributed database technologies commonly known today as "NoSQL."


That's not an exaggeration. Nice one Werner et al!
dynamo  history  nosql  storage  databases  distcomp  amazon  papers  acm  data-stores 
october 2017 by jm
Exactly-once Support in Apache Kafka – Jay Kreps
If you’re one of the people who think [exactly-once support is impossible], I’d ask you to take an actual look at what we actually know to be possible and impossible, and what has been built in Kafka, and hopefully come to a more informed opinion. So let’s address this in two parts. First, is exactly-once a theoretical impossibility? Second, how does Kafka support it.
exactly-once-delivery  distributed  kafka  distcomp  jay-kreps  coding  broadcast 
july 2017 by jm
Exactly-once Semantics is Possible: Here's How Apache Kafka Does it
How does this feature work? Under the covers it works in a way similar to TCP; each batch of messages sent to Kafka will contain a sequence number which the broker will use to dedupe any duplicate send. Unlike TCP, though—which provides guarantees only within a transient in-memory connection—this sequence number is persisted to the replicated log, so even if the leader fails, any broker that takes over will also know if a resend is a duplicate. The overhead of this mechanism is quite low: it’s just a few extra numeric fields with each batch of messages. As you will see later in this article, this feature add negligible performance overhead over the non-idempotent producer.
kafka  sequence-numbers  dedupe  deduplication  unique  architecture  distcomp  streaming  idempotence 
july 2017 by jm
Cadence: Microservice architecture beyond request/reply – @Scale
Uber’s request/reply handling middleware — based on the SWF API, it seems
swf  apis  microservices  uber  cadence  asynchronous  request-reply  distcomp  queueing  middleware  go 
june 2017 by jm
A Brief History of the UUID · Segment Blog
This is great, by Rick Branson. I didn't realise UUIDs came from Apollo
history  distributed  distcomp  uuids  ids  coding  apollo  unix 
june 2017 by jm
Hadoop Internals
This is the best documentation on the topic I've seen in a while
hadoop  map-reduce  architecture  coding  java  distcomp 
february 2017 by jm
Federated Learning: Strategies for Improving Communication Efficiency
"Federated Learning is a machine learning setting where the goal is to train a high-quality centralized model with training data distributed over a large number of clients each with unreliable and relatively slow network connections. We consider learning algorithms for this setting where on each round, each client independently computes an update to the current model based on its local data, and communicates this update to a central server, where the client-side updates are aggregated to compute a new global model. The typical clients in this setting are mobile phones, and communication efficiency is of utmost importance. In this paper, we propose two ways to reduce the uplink communication costs. The proposed methods are evaluated on the application of training a deep neural network to perform image classification. Our best approach reduces the upload communication required to train a reasonable model by two orders of magnitude."
papers  machine-learning  federated  distcomp  brendan-mcmahan 
january 2017 by arsyed
Service discovery at Stripe
Writeup of their Consul-based service discovery system, a bit similar to smartstack. Good description of the production problems that they saw with Consul too, and also they figured out that strong consistency isn't actually what you want in a service discovery system ;)

HN comments are good too: https://news.ycombinator.com/item?id=12840803
consul  api  microservices  service-discovery  dns  load-balancing  l7  tcp  distcomp  smartstack  stripe  cap-theorem  scalability 
november 2016 by jm
youtube/doorman
Doorman is a solution for Global Distributed Client Side Rate Limiting. Clients that talk to a shared resource (such as a database, a gRPC service, a RESTful API, or whatever) can use Doorman to voluntarily limit their use (usually in requests per second) of the resource. Doorman is written in Go and uses gRPC as its communication protocol. For some high-availability features it needs a distributed lock manager. We currently support etcd, but it should be relatively simple to make it use Zookeeper instead.


From google -- very interesting to see they're releasing this as open source, and it doesn't rely on G-internal services
distributed  distcomp  locking  youtube  golang  doorman  rate-limiting  rate-limits  limits  grpc  etcd 
july 2016 by jm

« earlier    

related tags

99th-percentile  acid  acm  aerospike  algorithm  algorithms  allow_mult  amazon  anti-spam  ap  apache  aphyr  api  apis  apollo  architecture  array  async  asynchronous  atc  atomic  aurora  availability  aws  basho  batch  big-data  blogs  brendan-mcmahan  broadcast  bugs  byzantine-generals  ca  cache  cadence  calm  camille-fournier  canary-requests  cap-theorem  cap  carlos-baquero  cascading  cassandra  cdt  celery  cep  checklists  chris-newcombe  clock-skew  clock  clocks  closures  cloud  cluster-management  cluster  clustering  coding  commutativity  concurrency  configuration  conflict-resolution  consensus-algorithms  consensus  consistency  consul  coreos  counters  cp  crdts  cron  cs  data-analysis  data-stores  data-stream  data-structures  data  database  databases  dataproc  debugging  dedupe  deduplication  deep-learning  devops  dht  differntial-privacy  dist-sys  distributed-consensus  distributed-cron  distributed-systems  distributed-transactions  distributed  distsys  dns  docker  doorman  dynamic  dynamo  dynamodb  ebs  ec2  ecdc  eda  em-algorithm  enstratius  erlang  etcd  events  eventual-consistency  exactly-once-delivery  facebook  failover  failure  fault-tolerance  federated  filesystems  flp  formal-methods  formats  fp  funny  g-counters  gc-scout  gc  glusterfs  go  golang  google  gossip-protocol  graph-processing  graph  grpc  hadoop  handoff-counters  hashicorp  hashing  hedged-requests  history  hlc  horizontal-scaling  hpc  http  id  idempotence  idempotency  ids  immutability  java  jay-kreps  jeff-darcy  jeff-dean  jepsen  john-ousterhout  kafka  kellabyte  l7  last-write-wins  latency  lda  leader-election  leases  ledger-fork  leslie-lamport  libs  limits  linkedin  load-balancing  locking  log  logging  logical-clocks  logs  low-latency  lww  machine-learning  map-reduce  mapreduce  marc-brooker  martin-kleppman  master-election  memcache  messaging  microservices  middleware  mnesia  model-checking  modelling  models  modin  mpl  murphi  mysql  naiad  nbta  netflix  netty  network-partitions  network  networking  nips-2015  nlp  nosql  ntp  numpy  open-source  ops  ordering  outages  packet-loss  pandas  papers  parallel  partition  partitioning  partitions  patterns  paxos  payment  pdf  people  performance  peter-bailis  pickling  ping  postgres  postgresql  postmortems  presentation  presentations  primitives  prod  programming  proofs  protocols  proxy  python  qcon  queueing  queues  quotes  rabbitmq  raft  ramp  rate-limiting  rate-limits  rdd  reading-groups  realtime  redis  redlock  ref  reference  reinforcement-learning  reliability  replicant  replication  request-reply  research  rest  riak  ricon  ripple  s3  saga  scala  scalability  scale  scaling  scheduling  scicomp  semilattice  sequence-numbers  serialization  service-discovery  sharding  silicon-valley  skew  slas  slides  smartstack  soa  software  spark  sparrow  spider.io  split-brain  spores  sql  state-machines  state  static  stellar  storage  storm  streaming  stripe  swarch  swf  synchronization  talks  tcp  tensorflow  testing  text  threadpools  time  timely-dataflow  timeouts  tla+  tla  toread  torus  transactions  truetime  uber  udp  unique  unix  usenix  uuid  uuids  vc  vector-clocks  verification  visualization  voldemort  wan  word-embedding  youtube  zookeeper 

Copy this bookmark:



description:


tags: