jm + event-processing 12
Osso
september 2016 by jm
"A modern standard for event-oriented data". Avro schema, events have time and type, schema is external and not part of the Avro stream.
'a modern standard for representing event-oriented data in high-throughput operational systems. It uses existing open standards for schema definition and serialization, but adds semantic meaning and definition to make integration between systems easy, while still being size- and processing-efficient.
An Osso event is largely use case agnostic, and can represent a log message, stack trace, metric sample, user action taken, ad display or click, generic HTTP event, or otherwise. Every event has a set of common fields as well as optional key/value attributes that are typically event type-specific.'
osso
events
schema
data
interchange
formats
cep
event-processing
architecture
'a modern standard for representing event-oriented data in high-throughput operational systems. It uses existing open standards for schema definition and serialization, but adds semantic meaning and definition to make integration between systems easy, while still being size- and processing-efficient.
An Osso event is largely use case agnostic, and can represent a log message, stack trace, metric sample, user action taken, ad display or click, generic HTTP event, or otherwise. Every event has a set of common fields as well as optional key/value attributes that are typically event type-specific.'
september 2016 by jm
The world beyond batch: Streaming 101 - O'Reilly Media
streaming
batch
big-data
lambda-architecture
dataflow
event-processing
cep
millwheel
data
data-processing
august 2015 by jm
To summarize, in this post I’ve:
Clarified terminology, specifically narrowing the definition of “streaming” to apply to execution engines only, while using more descriptive terms like unbounded data and approximate/speculative results for distinct concepts often categorized under the “streaming” umbrella.
Assessed the relative capabilities of well-designed batch and streaming systems, positing that streaming is in fact a strict superset of batch, and that notions like the Lambda Architecture, which are predicated on streaming being inferior to batch, are destined for retirement as streaming systems mature.
Proposed two high-level concepts necessary for streaming systems to both catch up to and ultimately surpass batch, those being correctness and tools for reasoning about time, respectively.
Established the important differences between event time and processing time, characterized the difficulties those differences impose when analyzing data in the context of when they occurred, and proposed a shift in approach away from notions of completeness and toward simply adapting to changes in data over time.
Looked at the major data processing approaches in common use today for bounded and unbounded data, via both batch and streaming engines, roughly categorizing the unbounded approaches into: time-agnostic, approximation, windowing by processing time, and windowing by event time.
august 2015 by jm
AWS Lambda Event-Driven Architecture With Amazon SNS
aws
ec2
lambda
sns
events
cep
event-processing
coding
cloud
hacks
eric-hammond
april 2015 by jm
Any message posted to an SNS topic can trigger the execution of custom code you have written, but you don’t have to maintain any infrastructure to keep that code available to listen for those events and you don’t have to pay for any infrastructure when the code is not being run. This is, in my opinion, the first time that Amazon can truly say that AWS Lambda is event-driven, as we now have a central, independent, event management system (SNS) where any authorized entity can trigger the event (post a message to a topic) and any authorized AWS Lambda function can listen for the event, and neither has to know about the other.
april 2015 by jm
Kafka best practices
tl;dr: limit the number of Kafka clusters; use Avro.
architecture
kafka
storage
streaming
event-processing
avro
schema
confluent
best-practices
tips
march 2015 by jm
This is the second part of our guide on streaming data and Apache Kafka. In part one I talked about the uses for real-time data streams and explained our idea of a stream data platform. The remainder of this guide will contain specific advice on how to go about building a stream data platform in your organization.
tl;dr: limit the number of Kafka clusters; use Avro.
march 2015 by jm
Announcing Confluent, A Company for Apache Kafka And Realtime Data
november 2014 by jm
Jay Kreps, Neha Narkhede, and Jun Rao are leaving LinkedIn to form a Kafka-oriented realtime event processing company
realtime
event-processing
logs
kafka
streaming
open-source
jay-kreps
jun-rao
confluent
november 2014 by jm
All Data Are Belong to AWS: Streaming upload via Fluentd
august 2014 by jm
Fluentd looks like a decent foundation for tailing/streaming event processing in Ruby, supporting batched output to S3 and a bunch of other AWS services, Kafka, and RabbitMQ for output. Claims to have ok performance, despite its Rubbitude. However, its high-availability story is shite, so not to be used where availability is important
ruby
rabbitmq
kafka
tail
event-streaming
cep
event-processing
s3
aws
sqs
fluentd
august 2014 by jm
Twitter's TSAR
june 2014 by jm
TSAR = "Time Series AggregatoR". Twitter's new event processor-style architecture for internal metrics. It's notable that now Twitter and Google are both apparently moving towards this idea of a model of code which is designed to run equally in realtime streaming and batch modes (Summingbird, Millwheel, Flume).
analytics
architecture
twitter
tsar
aggregation
event-processing
metrics
streaming
hadoop
batch
june 2014 by jm
Grape
november 2013 by jm
a realtime processing engine, built on a persistent queue and a set of workers. 'The main goal is data availability and persistency. We created grape for those who cannot afford losing data'. It does this by allowing infinite expansion of the pending queue in Elliptics, their Dynamo-like horizontally-scaled storage backend.
kafka
queue
queueing
storage
realtime
fault-tolerance
grape
cep
event-processing
november 2013 by jm
ZeroMQ: Helping us Block Malicious Domains in Real Time - Umbrella Security Labs
october 2013 by jm
nice writeup of a ZeroMQ/Hadoop event processing pipeline architecture
zeromq
hadoop
event-processing
architecture
dns
backend
reputation
october 2013 by jm
_MillWheel: Fault-Tolerant Stream Processing at Internet Scale_ [paper, pdf]
august 2013 by jm
from VLDB 2013:
millwheel
google
data-processing
cep
low-latency
fault-tolerance
scalability
papers
event-processing
stream-processing
MillWheel is a framework for building low-latency data-processing applications that is widely used at Google. Users specify a directed computation graph and application code for individual nodes, and the system manages persistent state and the continuous flow of records, all within the envelope of the framework’s fault-tolerance guarantees.
This paper describes MillWheel’s programming model as well as its implementation. The case study of a continuous anomaly detector in use at Google serves to motivate how many of MillWheel’s features are used. MillWheel’s programming model provides a notion of logical time, making it simple to write time-based aggregations. MillWheel was designed from the outset with fault tolerance and scalability in mind. In practice, we find that MillWheel’s unique combination of scalability, fault tolerance, and a versatile programming model lends itself to a wide variety of problems at Google.
august 2013 by jm
Data distribution in the cloud with Node.js
october 2012 by jm
Very interesting presentation from ex-IONAian Darach Ennis of Push Technology on eep.js, embedded event processing in Javascript for node.js stream processing. Handles tumbling, monotonic, periodic and sliding windows at 8-40 million events per second; no multi-dimensional, infinite or predicate event-processing windows. (via Sergio Bossa)
via:sbtourist
events
event-processing
streaming
data
ex-iona
darach-ennis
push-technology
cep
javascript
node.js
streams
october 2012 by jm
Scott Andreas - Garbage, Garbage Everywhere [slides]
december 2011 by jm
'GC Strategies for Event Processing Systems on the JVM'
gc
java
jvm
event-streams
event-processing
tuning
slides
presentations
scott-andreas
performance
december 2011 by jm
related tags
aggregation ⊕ analytics ⊕ architecture ⊕ avro ⊕ aws ⊕ backend ⊕ batch ⊕ best-practices ⊕ big-data ⊕ cep ⊕ cloud ⊕ coding ⊕ confluent ⊕ darach-ennis ⊕ data ⊕ data-processing ⊕ dataflow ⊕ dns ⊕ ec2 ⊕ eric-hammond ⊕ event-processing ⊖ event-streaming ⊕ event-streams ⊕ events ⊕ ex-iona ⊕ fault-tolerance ⊕ fluentd ⊕ formats ⊕ gc ⊕ google ⊕ grape ⊕ hacks ⊕ hadoop ⊕ interchange ⊕ java ⊕ javascript ⊕ jay-kreps ⊕ jun-rao ⊕ jvm ⊕ kafka ⊕ lambda ⊕ lambda-architecture ⊕ logs ⊕ low-latency ⊕ metrics ⊕ millwheel ⊕ node.js ⊕ open-source ⊕ osso ⊕ papers ⊕ performance ⊕ presentations ⊕ push-technology ⊕ queue ⊕ queueing ⊕ rabbitmq ⊕ realtime ⊕ reputation ⊕ ruby ⊕ s3 ⊕ scalability ⊕ schema ⊕ scott-andreas ⊕ slides ⊕ sns ⊕ sqs ⊕ storage ⊕ stream-processing ⊕ streaming ⊕ streams ⊕ tail ⊕ tips ⊕ tsar ⊕ tuning ⊕ twitter ⊕ via:sbtourist ⊕ zeromq ⊕Copy this bookmark: