scalability   37738

« earlier    

The Log: What every software engineer should know about real-time data's unifying abstraction | LinkedIn Engineering
Excellent, end kinda looks like CQRS + Event Sourcing

"Deterministic means that the processing isn't timing dependent and doesn't let any other "out of band" input influence its results. For example a program whose output is influenced by the particular order of execution of threads or by a call to gettimeofday or some other non-repeatable thing is generally best considered as non-deterministic."

"The purpose of the log here is to squeeze all the non-determinism out of the input stream to ensure that each replica processing this input stays in sync."

"One of the beautiful things about this approach is that the time stamps that index the log now act as the clock for the state of the replicas—you can describe each replica by a single number, the timestamp for the maximum log entry it has processed. This timestamp combined with the log uniquely captures the entire state of the replica."

"The "state machine model" usually refers to an active-active model where we keep a log of the incoming requests and each replica processes each request. A slight modification of this, called the "primary-backup model", is to elect one replica as the leader and allow this leader to process requests in the order they arrive and log out the changes to its state from processing the requests."

"the usefulness of the log comes from simple function that the log provides: producing a persistent, re-playable record of history. Surprisingly, at the core of these problems is the ability to have many machines playback history at their own rate in a deterministic manner."

"Let's say we write a record with log entry X and then need to do a read from the cache. If we want to guarantee we don't see stale data, we just need to ensure we don't read from any cache which has not replicated up to X."

"You can think of the log as acting as a kind of messaging system with durability guarantees and strong ordering semantics. In distributed systems, this model of communication sometimes goes by the (somewhat terrible) name of atomic broadcast."

"It's worth emphasizing that the log is still just the infrastructure. That isn't the end of the story of mastering data flow: the rest of the story is around metadata, schemas, compatibility, and all the details of handling data structure and evolution. But until there is a reliable, general way of handling the mechanics of data flow, the semantic details are secondary."

"The idea is that adding a new data system—be it a data source or a data destination—should create integration work only to connect it to a single pipeline instead of each consumer of data."

"The key problem for a data-centric organization is coupling the clean integrated data to the data warehouse. A data warehouse is a piece of batch query infrastructure which is well suited to many kinds of reporting and ad hoc analysis, particularly when the queries involve simple counting, aggregation, and filtering. But having a batch system be the only repository of clean complete data means the data is unavailable for systems requiring a real-time feed—real-time processing, search indexing, monitoring systems, etc."

"At LinkedIn, we have built our event data handling in a log-centric fashion. We are using Kafka as the central, multi-subscriber event log. We have defined several hundred event types, each capturing the unique attributes about a particular type of action. This covers everything from page views, ad impressions, and searches, to service invocations and application exceptions."

"The "event-driven" style provides an approach to simplifying this. The job display page now just shows a job and records the fact that a job was shown along with the relevant attributes of the job, the viewer, and any other useful facts about the display of the job. Each of the other interested systems—the recommendation system, the security system, the job poster analytics system, and the data warehouse—all just subscribe to the feed and do their processing. The display code need not be aware of these other systems, and needn't be changed if a new data consumer is added."

"Lack of a global order across partitions is a limitation, but we have not found it to be a major one. Indeed, interaction with the log typically comes from hundreds or thousands of distinct processes so it is not meaningful to talk about a total order over their behavior. Instead, the guarantees that we provide are that each partition is order preserving, and Kafka guarantees that appends to a particular partition from a single sender will be delivered in the order they are sent."

"The cumulative effect of these optimizations is that you can usually write and read data at the rate supported by the disk or network, even while maintaining data sets that vastly exceed memory."

"Even in the presence of a healthy batch processing ecosystem, I think the actual applicability of stream processing as an infrastructure style is quite broad. I think it covers the gap in infrastructure between real-time request/response services and offline batch processing. For modern internet companies, I think around 25% of their code falls into this category."

" First, it makes each dataset multi-subscriber and ordered. Recall our "state replication" principle to remember the importance of order. To make this more concrete, consider a stream of updates from a database—if we re-order two updates to the same record in our processing we may produce the wrong final output. This order is more permanent than what is provided by something like TCP as it is not limited to a single point-to-point link and survives beyond process failures and reconnections.

Second, the log provides buffering to the processes. This is very fundamental. If processing proceeds in an unsynchronized fashion it is likely to happen that an upstream data producing job will produce data more quickly than another downstream job can consume it. When this occurs processing must block, buffer or drop data. Dropping data is likely not an option; blocking may cause the entire processing graph to grind to a halt. The log acts as a very, very large buffer that allows process to be restarted or fail without slowing down other parts of the processing graph. This isolation is particularly important when extending this data flow to a larger organization, where processing is happening by jobs made by many different teams. We cannot have one faulty job cause back-pressure that stops the entire processing flow.

Both Storm and Samza are built in this fashion and can use Kafka or other similar systems as their log."

"A stream processor can keep it's state in a local "table" or "index"—a bdb, leveldb, or even something more unusual such as a Lucene or fastbit index. The contents of this this store is fed from its input streams (after first perhaps applying arbitrary transformation). It can journal out a changelog for this local index it keeps to allow it to restore its state in the event of a crash and restart. This mechanism allows a generic mechanism for keeping co-partitioned state in arbitrary index types local with the incoming stream data."

"When the process fails, it restores its index from the changelog. The log is the transformation of the local state into a sort of incremental record at a time backup."

"However, retaining the complete log will use more and more space as time goes by, and the replay will take longer and longer. Hence, in Kafka, we support a different type of retention. Instead of simply throwing away the old log, we remove obsolete records—i.e. records whose primary key has a more recent update. By doing this, we still guarantee that the log contains a complete backup of the source system, but now we can no longer recreate all previous states of the source system, only the more recent ones. We call this feature log compaction."

"The client can get read-your-write semantics from any node by providing the timestamp of a write as part of its query—a serving node receiving such a query will compare the desired timestamp to its own index point and if necessary delay the request until it has indexed up to at least that time to avoid serving stale data."

"One of the trickier things a distributed system must do is handle restoring failed nodes or moving partitions from node to node. A typical approach would have the log retain only a fixed window of data and combine this with a snapshot of the data stored in the partition. It is equally possible for the log to retain a complete copy of data and garbage collect the log itself. This moves a significant amount of complexity out of the serving layer, which is system-specific, and into the log, which can be general purpose"

"I find this view of systems as factored into a log and query api to very revealing, as it lets you separate the query characteristics from the availability and consistency aspects of the system. I actually think this is even a useful way to mentally factor a system that isn't built this way to better understand it."
DistributedSystems  Scalability  Logs  architecture  events  Kinesis  Kafka  Samza  Messaging  EventSourcing  CQRS 
9 days ago by colin.jack
The SEC's decision on Investor's Exchange (IEX) is really about the computer science problems plaguing the stock market — Quartz
There’s just one problem: Latency isn’t a market problem, it’s a technical problem that’s been enshrined in regulation. Consequently, it may not be legal for a public exchange to intentionally disrupt it.
fintech  scalability  networking 
20 days ago by janpeuker
An open platform to connect, manage, and secure microservices
2017  tools  distributed  hold  scalability 
20 days ago by giorgio_v

« earlier    

related tags

2017  @article  @column  @comparison  @concept  @concpet  algorithm  algorithms  analysis  analytics  api  architecture  arm  article  async  asynchronous  automation  aws  azure  backend  big_data  bitcoin  block  blockchain  book  c-lang  cache  caching  capacity-planning  capacity  casestudy  chain  clevermarks  cloud  cloudcomputing  cluster  clustering  coink  communication  compatibility  compatible  compute  computing  concurrency  consistent-hashing  consistent  container  containers  cqrs  cryptography  database  databases  datasync  development  devops  distributed-systems  distributed  distributedsystem  distributedsystems  docker  documentation  encapsulationanddecoupling  engineering  epoll  ethereum  event-sourcing  events  eventsourcing  exchange  faas  fault-tolerance  filemanagement  finagle  fintech  framework  france  freebsd  functional  functions  gerrit  giflib  git  go  golang  google  google_cloud  ha  habitroutineandpattern  hardware  hash  hashing  high  highavailability  hold  hpc  iaas  image  important  intel  interview  introductory  iscsi  joi  kafka  kernel  kinesis  kubernetes  libuv  linux  load_balancing  logs  machine-learning  management  mapreduce  math  mec  messaging  microkernel  microservices  microsoft  monitoring  monolithic  multimaster  mysql  netflix  network  networking  nodejs  openai  openbsd  orga  paper  patterns  performance  pinterest  planning  postgres  postgresql  presentation  programming  prometheus  proof  python  queue  rant  ratelimiting  react  read2of  rest  rpc  saas  samza  scala  scale  scaling  server  serversiderendering  sharding  sizing  slides  smartcontracts  software-architecture  software  solution  sql  stack  stake  storage  swarm  sysadmin  system-design  system_design  systemarchitecture  teams  tech  tensorflow  testing  tidb  tools  toread  traefik  unikernel  unix  ux  value  video  virtualisation  virtualization  vm  web  web_app  web_service  webdev  webservices  wordspress  work 

Copy this bookmark: