jm + dedupe   2

Exactly-once Semantics is Possible: Here's How Apache Kafka Does it
How does this feature work? Under the covers it works in a way similar to TCP; each batch of messages sent to Kafka will contain a sequence number which the broker will use to dedupe any duplicate send. Unlike TCP, though—which provides guarantees only within a transient in-memory connection—this sequence number is persisted to the replicated log, so even if the leader fails, any broker that takes over will also know if a resend is a duplicate. The overhead of this mechanism is quite low: it’s just a few extra numeric fields with each batch of messages. As you will see later in this article, this feature add negligible performance overhead over the non-idempotent producer.
kafka  sequence-numbers  dedupe  deduplication  unique  architecture  distcomp  streaming  idempotence 
23 days ago by jm
Under the Covers of DynamoDB
mostly a DynamoDB puff-piece from last week's Amazon Cloud Connect, but contains some good real-world figures for a 20-billion-GUID deduping table use-case at end. ($4,150 per month, to cut to the chase)
dynamodb  aws  figures  costs  architecture  ec2  dedupe  cloud-connect  slides 
april 2013 by jm

Copy this bookmark: