jm + streaming   48

Cloudy Gamer: Playing Overwatch on Azure's new monster GPU instances
pretty amazing. full 60FPS, 2560x1600, everything on Epic quality, streaming from Azure, for $2 per hour
gaming  azure  games  cloud  gpu  overwatch  streaming 
october 2016 by jm
Open Sourcing Twitter Heron
Twitter are open sourcing their Storm replacement, and moving it to an independent open source foundation
open-source  twitter  heron  storm  streaming  architecture  lambda-architecture 
may 2016 by jm
Live Streaming Security Games
Rapid Fire is a special event we started hosting at our own in-person CTFs in 2014. The idea is pretty simple:

Create several CTF challenges that can be solved in a few minutes each.
Set up the challenges on 4 identical computers with some basic tools.
Mirror the player’s screens so the audience can watch their actions.
Whoever solves the most challenges the fastest wins.

This event is interesting for a number of reasons: the players are under intense pressure, as everything they do is being watched by several people; the audience can watch several different approaches to the same problems; and people can follow along fairly easily with what is going on with the challenges.

With e-sports-style video!
gaming  hacking  security  e-sports  streaming  twitch  ctf 
may 2016 by jm
Apple Stole My Music. No, Seriously.
some amazingly terrible product decisions here. Deleting local copies of unreleased WAV files -- on the assumption that the user will simply listen to them streamed down from Apple Music -- that is astonishingly bad, and it's amazing they didn't consider the "freelance composer" use case at all. (via Tony Finch)
apple  music  terrible  wav  sound  copyright  streaming  apple-music  design  product  fail 
may 2016 by jm
How we implemented the video player in Mail.Ru Cloud
We’ve recently added video streaming service to Mail.Ru Cloud. Development started with contemplating the new feature as an all-purpose “Swiss Army knife” that would both play files of any format and work on any device with the Cloud available. Video content uploaded to the Cloud mostly falls into one of the two categories: “movies/series” and “users’ videos”. The latter are the videos that users shoot with their phones and cameras, and these videos are most versatile in terms of formats and codecs. For many reasons, it is often a problem to watch these videos on other end-user devices without prior normalization: a required codec is missing, or the file size is too big to download, or whatever.

Mainly around using HLS (HTTP Live Streaming).
hls  http  streaming  video  audio  players  codecs 
march 2016 by jm
The Totally Managed Analytics Pipeline: Segment, Lambda, and Dynamo
notable mainly for the details of Terraform support for Lambda: that's a significant improvement to Lambda's production-readiness
aws  pipelines  data  streaming  lambda  dynamodb  analytics  terraform  ops 
october 2015 by jm
SQL on Kafka using PipelineDB
this is quite nice. PipelineDB allows direct hookup of a Kafka stream, and will ingest durably and reliably, and provide SQL views computed over a sliding window of the stream.
logging  sql  kafka  pipelinedb  streaming  sliding-window  databases  search  querying 
september 2015 by jm
Streaming will soon pass traditional TV - Tech Insider
the percentage of people who say they stream video from services like Netflix, YouTube, and Hulu each day has increased dramatically over the last five years, from about 30% in 2010 to more than 50% this year. During the same period, the percentage of people who say they watch traditional TV [...] has dropped by about 10%. When the beige line surpasses the purple line [looks like 2016], it will mean that more people are streaming each day than are watching traditional TV. 
streaming  hulu  netflix  tv  television  video  youtube 
september 2015 by jm
Evolution of Babbel’s data pipeline on AWS: from SQS to Kinesis
Good "here's how we found it" blog post:

Our new data pipeline with Kinesis in place allows us to plug new consumers without causing any damage to the current system, so it’s possible to rewrite all Queue Workers one by one and replace them with Kinesis Workers. In general, the transition to Kinesis was smooth and there were not so tricky parts.
Another outcome was significantly reduced costs – handling almost the same amount of data as SQS, Kinesis appeared to be many times cheaper than SQS.
aws  kinesis  kafka  streaming  data-pipelines  streams  sqs  queues  architecture  kcl 
september 2015 by jm
Mining High-Speed Data Streams: The Hoeffding Tree Algorithm
This paper proposes a decision tree learner for data streams, the Hoeffding Tree algorithm, which comes with the guarantee that the learned decision tree is asymptotically nearly identical to that of a non-incremental learner using infinitely many examples. This work constitutes a significant step in developing methodology suitable for modern ‘big data’ challenges and has initiated a lot of follow-up research. The Hoeffding Tree algorithm has been covered in various textbooks and is available in several public domain tools, including the WEKA Data Mining platform.
hoeffding-tree  algorithms  data-structures  streaming  streams  cep  decision-trees  ml  learning  papers 
august 2015 by jm
The world beyond batch: Streaming 101 - O'Reilly Media
To summarize, in this post I’ve:

Clarified terminology, specifically narrowing the definition of “streaming” to apply to execution engines only, while using more descriptive terms like unbounded data and approximate/speculative results for distinct concepts often categorized under the “streaming” umbrella.

Assessed the relative capabilities of well-designed batch and streaming systems, positing that streaming is in fact a strict superset of batch, and that notions like the Lambda Architecture, which are predicated on streaming being inferior to batch, are destined for retirement as streaming systems mature.

Proposed two high-level concepts necessary for streaming systems to both catch up to and ultimately surpass batch, those being correctness and tools for reasoning about time, respectively.

Established the important differences between event time and processing time, characterized the difficulties those differences impose when analyzing data in the context of when they occurred, and proposed a shift in approach away from notions of completeness and toward simply adapting to changes in data over time.

Looked at the major data processing approaches in common use today for bounded and unbounded data, via both batch and streaming engines, roughly categorizing the unbounded approaches into: time-agnostic, approximation, windowing by processing time, and windowing by event time.
streaming  batch  big-data  lambda-architecture  dataflow  event-processing  cep  millwheel  data  data-processing 
august 2015 by jm
Implementing Efficient and Reliable Producers with the Amazon Kinesis Producer Library - AWS Big Data Blog
Good advice on production-quality, decent-scale usage of Kinesis in Java with the official library: batching, retries, partial failures, backoff, and monitoring. (Also, jaysus, the AWS Cloudwatch API is awful, looking at this!)
kpl  aws  kinesis  tips  java  batching  streaming  production  cloudwatch  monitoring  coding 
august 2015 by jm
The Netflix Test Video
Netflix' official test video -- contains various scenarios which exercise frequent tricky edge cases in video compression and playback; A/V sync, shades of black, running water, etc.
networking  netflix  streaming  video  compression  tests 
august 2015 by jm
jgc on Cloudflare's log pipeline
Cloudflare are running a 40-machine, 50TB Kafka cluster, ingesting at 15 Gbps, for log processing. Also: Go producers/consumers, capnproto as wire format, and CitusDB/Postgres to store rolled-up analytics output. Also using Space Saver (top-k) and HLL (counting) estimation algorithms.
logs  cloudflare  kafka  go  capnproto  architecture  citusdb  postgres  analytics  streaming 
june 2015 by jm
Discretized Streams: Fault Tolerant Stream Computing at Scale
The paper describing the innards of Spark Streaming and its RDD-based recomputation algorithm:
we use a data structure called Resilient Distributed Datasets (RDDs), which keeps data in memory and can recover it without replication by tracking the lineage graph of operations that were used to build it. With RDDs, we show that we can attain sub-second end-to-end latencies. We believe that this is sufficient for many real-world big data applications, where the timescale of the events tracked (e.g., trends in social media) is much higher.
rdd  spark  streaming  fault-tolerance  batch  distcomp  papers  big-data  scalability 
june 2015 by jm
Adrian Colyer reviews the Twitter Heron paper
ouch, really sounds like Storm didn't cut the muster. 'It’s hard to imagine something more damaging to Apache Storm than this. Having read it through, I’m left with the impression that the paper might as well have been titled “Why Storm Sucks”, which coming from Twitter themselves is quite a statement.'

If I was to summarise the lessons learned, it sounds like: backpressure is required; and multi-tenant architectures suck.

Update: response from Storm dev ptgoetz here:
storm  twitter  heron  big-data  streaming  realtime  backpressure 
june 2015 by jm
Twitter ditches Storm
in favour of a proprietary ground-up rewrite called Heron. Reading between the lines it sounds like Storm had problems with latency, reliability, data loss, and supporting back pressure.
analytics  architecture  twitter  storm  heron  backpressure  streaming  realtime  queueing 
june 2015 by jm
HTTP/2 is here, let's optimize! - Velocity SC 2015 - Google Slides
Changes which server-side developers will need to start considering as HTTP/2 rolls out. Remove domain sharding; stop concatenating resources; stop inlining resources; use server push.
http2  http  protocols  streaming  internet  web  dns  performance 
june 2015 by jm
streamtools: a graphical tool for working with streams of data | nytlabs
Visual programming, Yahoo! Pipes style, back again:
we have created streamtools – a new, open source project by The New York Times R&D Lab which provides a general purpose, graphical tool for dealing with streams of data. It provides a vocabulary of operations that can be connected together to create live data processing systems without the need for programming or complicated infrastructure. These systems are assembled using a visual interface that affords both immediate understanding and live manipulation of the system.

via Aman
via:akohli  streaming  data  nytimes  visual-programming  coding 
may 2015 by jm
RADStack - an open source Lambda Architecture built on Druid, Kafka and Samza
'In this paper we presented the RADStack, a collection of complementary technologies that can be used together to power interactive analytic applications. The key pieces of the stack are Kafka, Samza, Hadoop, and Druid. Druid is designed for exploratory analytics and is optimized for low latency data exploration, aggregation, and ingestion, and is well suited for OLAP workflows. Samza and Hadoop complement Druid and add data processing functionality, and Kafka enables high throughput event delivery.'
druid  samza  kafka  streaming  cep  lambda-architecture  architecture  hadoop  big-data  olap 
april 2015 by jm
Kafka best practices
This is the second part of our guide on streaming data and Apache Kafka. In part one I talked about the uses for real-time data streams and explained our idea of a stream data platform. The remainder of this guide will contain specific advice on how to go about building a stream data platform in your organization.

tl;dr: limit the number of Kafka clusters; use Avro.
architecture  kafka  storage  streaming  event-processing  avro  schema  confluent  best-practices  tips 
march 2015 by jm
DynamoDB Streams
This is pretty awesome. All changes to a DynamoDB table can be streamed to a Kinesis stream, MySQL-replication-style.

The nice bit is that it has a solid way to ensure readers won't get overwhelmed by the stream volume (since ddb tables are IOPS-rate-limited), and Kinesis has a solid way to read missed updates (since it's a Kafka-style windowed persistent stream). With this you have a pretty reliable way to ensure you're not going to suffer data loss.
iops  dynamodb  aws  kinesis  reliability  replication  multi-az  multi-region  failover  streaming  kafka 
november 2014 by jm
Announcing Confluent, A Company for Apache Kafka And Realtime Data
Jay Kreps, Neha Narkhede, and Jun Rao are leaving LinkedIn to form a Kafka-oriented realtime event processing company
realtime  event-processing  logs  kafka  streaming  open-source  jay-kreps  jun-rao  confluent 
november 2014 by jm
Inside Apple’s Live Event Stream Failure, And Why It Happened: It Wasn’t A Capacity Issue
The bottom line with this event is that the encoding, translation, JavaScript code, the video player, the call to S3 single storage location and the millisecond refreshes all didn’t work properly together and was the root cause of Apple’s failed attempt to make the live stream work without any problems. So while it would be easy to say it was a CDN capacity issue, which was my initial thought considering how many events are taking place today and this week, it does not appear that a lack of capacity played any part in the event not working properly. Apple simply didn’t provision and plan for the event properly.
cdn  streaming  apple  fail  scaling  s3  akamai  caching 
september 2014 by jm
A Go implementation of Greenwald-Khanna streaming quantiles: - 'a new online algorithm for computing approximate quantile summaries of very large data sequences with a worst-case space requirement of O(1/e log eN))'
quantiles  go  algorithms  greenwald-khanna  percentiles  streaming  cep  space-efficient 
july 2014 by jm
Twitter's TSAR
TSAR = "Time Series AggregatoR". Twitter's new event processor-style architecture for internal metrics. It's notable that now Twitter and Google are both apparently moving towards this idea of a model of code which is designed to run equally in realtime streaming and batch modes (Summingbird, Millwheel, Flume).
analytics  architecture  twitter  tsar  aggregation  event-processing  metrics  streaming  hadoop  batch 
june 2014 by jm
Video Processing at Dropbox
On-the-fly video transcoding during live streaming. They've done a great job of this!
At the beginning of the development of this feature, we entertained the idea to simply pre-transcode all the videos in Dropbox to all possible target devices. Soon enough we realized that this simple approach would be too expensive at our scale, so we decided to build a system that allows us to trigger a transcoding process only upon user request and cache the results for subsequent fetches. This on-demand approach: adapts to heterogeneous devices and network conditions, is relatively cheap (everything is relative at our scale), guarantees low latency startup time.
ffmpeg  dropbox  streaming  video  cdn  ec2  hls  http  mp4  nginx  haproxy  aws  h264 
february 2014 by jm
SAMOA, an open source platform for mining big data streams
Yahoo!'s streaming machine learning platform, built on Storm, implementing:

As a library, SAMOA contains state-of-the-art implementations of algorithms for distributed machine learning on streams. The first alpha release allows classification and clustering. For classification, we implemented a Vertical Hoeffding Tree (VHT), a distributed streaming version of decision trees tailored for sparse data (e.g., text). For clustering, we included a distributed algorithm based on CluStream. The library also includes meta-algorithms such as bagging.
storm  streaming  big-data  realtime  samoa  yahoo  machine-learning  ml  decision-trees  clustering  bagging  classification 
november 2013 by jm
Piracy is a 'minority activity', pirates spend more on content, and piracy rates dropped in the UK during 2012
OfCom has published a report on online piracy, which found that the practice is becoming less common and that pirates tend to spend more on legitimate content than non-pirates.

The research, which was not funded by the entertainment industry, was conducted by Kantar Media among 21,474 participants and took place in 2012 across four separate stages. Over that time, the ratio of legal to illegal content fell -- confirming a suspected trend as legal streaming options became more available.

It also confirmed another suspicion -- that a relatively small number of web users are responsible for most piracy. In OfCom's data, just two percent of users conducted three quarters of all piracy. Ofcom described piracy as "a minority activity".

Of those surveyed, 58 percent accessed music, movie or TV content online, while 17 percent accessed illegal content sources. Those who admitted pirating content spent on average £26 every three months on legitimate content, set against an average spend of £16 among non-pirates.
wired  piracy  studies  ofcom  streaming 
september 2013 by jm
Sketch of the Day – Frugal Streaming
ha, this is very clever! If you have enough volume, this is a nice estimation algorithm to compute stream quantiles in very little RAM
memory  streaming  stream-processing  clever  algorithms  hacks  streams 
september 2013 by jm
How To Buffer Full YouTube Videos Before Playing
summary - turn off DASH (Dynamic adaptive streaming) using a userscript.
chrome  youtube  google  video  dash  mpeg  streaming 
september 2013 by jm
Streaming MapReduce with Summingbird
Before Summingbird at Twitter, users that wanted to write production streaming aggregations would typically write their logic using a Hadoop DSL like Pig or Scalding. These tools offered nice distributed system abstractions: Pig resembled familiar SQL, while Scalding, like Summingbird, mimics the Scala collections API. By running these jobs on some regular schedule (typically hourly or daily), users could build time series dashboards with very reliable error bounds at the unfortunate cost of high latency.

While using Hadoop for these types of loads is effective, Twitter is about real-time and we needed a general system to deliver data in seconds, not hours. Twitter’s release of Storm made it easy to process data with very low latencies by sacrificing Hadoop’s fault tolerant guarantees. However, we soon realized that running a fully real-time system on Storm was quite difficult for two main reasons:

Recomputation over months of historical logs must be coordinated with Hadoop or streamed through Storm with a custom log loading mechanism;
Storm is focused on message passing and random-write databases are harder to maintain.

The types of aggregations one can perform in Storm are very similar to what’s possible in Hadoop, but the system issues are very different. Summingbird began as an investigation into a hybrid system that could run a streaming aggregation in both Hadoop and Storm, as well as merge automatically without special consideration of the job author. The hybrid model allows most data to be processed by Hadoop and served out of a read-only store. Only data that Hadoop hasn’t yet been able to process (data that falls within the latency window) would be served out of a datastore populated in real-time by Storm. But the error of the real-time layer is bounded, as Hadoop will eventually get around to processing the same data and will smooth out any error introduced. This hybrid model is appealing because you get well understood, transactional behavior from Hadoop, and up to the second additions from Storm. Despite the appeal, the hybrid approach has the following practical problems:

Two sets of aggregation logic have to be kept in sync in two different systems;
Keys and values must be serialized consistently between each system and the client.

The client is responsible for reading from both datastores, performing a final aggregation and serving the combined results
Summingbird was developed to provide a general solution to these problems.

Very interesting stuff. I'm particularly interested in the design constraints they've chosen to impose to achieve this -- data formats which require associative merging in particular.
mapreduce  streaming  big-data  twitter  storm  summingbird  scala  pig  hadoop  aggregation  merging 
september 2013 by jm
How to avoid crappy ISP caches when viewing YouTube video
Must give this a try when I get home -- I frequently have latency problems watching YT on my UPC connection, and I bet they have a crappily-managed, overloaded cache box on their network.
streaming  youtube  caching  isps  caches  firewalls  iptables  hacks  video  networking 
august 2013 by jm
Why YouTube buffers: The secret deals that make -- and break -- online video
Should ISPs be required to ensure they have sufficient upstream bandwidth to video sites like YouTube and Netflix?
"Verizon has chosen to sell its customers a product [Netflix] that they hope those customers don't actually use," Schaeffer said. "And when customers use it and request movies, they have not ensured there is adequate connectivity to get that video content back to their customers."
netflix  youtube  streaming  video  isps  net-neutrality  peering  comcast  bandwidth  upstream 
july 2013 by jm
_Dynamic Histograms: Capturing Evolving Data Sets_ [pdf]

Currently, histograms are static structures: they are created from scratch periodically and their creation is based on looking at the entire data distribution as it exists each time. This creates problems, however, as data stored in DBMSs usually varies with time. If new data arrives at a high rate and old data is likewise deleted, a histogram’s accuracy may deteriorate fast as the histogram becomes older, and the optimizer’s effectiveness may be lost. Hence, how often a histogram is reconstructed becomes very critical, but choosing the right period is a hard problem, as the following trade-off exists: If the period is too long, histograms may become outdated. If the period is too short, updates of the histogram may incur a high overhead.

In this paper, we propose what we believe is the most elegant solution to the problem, i.e., maintaining dynamic histograms within given limits of memory space. Dynamic histograms are continuously updateable, closely tracking changes to the actual data. We consider two of the best static histograms proposed in the literature [9], namely V-Optimal and Compressed, and modify them. The new histograms are naturally called Dynamic V-Optimal (DVO) and Dynamic Compressed (DC). In addition, we modified V-Optimal’s partition constraint to create the Static Average-Deviation Optimal (SADO) and Dynamic Average-Deviation Optimal (DADO) histograms.

(via d2fn)
via:d2fn  histograms  streaming  big-data  data  dvo  dc  sado  dado  dynamic-histograms  papers  toread 
may 2013 by jm
Compression in Kafka: GZIP or Snappy ?
With Ack: in this mode, as far as compression is concerned, the data gets compressed at the producer, decompressed and compressed on the broker before it sends the ack to the producer. The producer throughput with Snappy compression was roughly 22.3MB/s as compared to 8.9MB/s of the GZIP producer. Producer throughput is 150% higher with Snappy as compared to GZIP.

No ack, similar to Kafka 0.7 behavior: In this mode, the data gets compressed at the producer and it doesn’t wait for the ack from the broker. The producer throughput with Snappy compression was roughly 60.8MB/s as compared to 18.5MB/s of the GZIP producer. Producer throughput is 228% higher with Snappy as compared to GZIP. The higher compression savings in this test are due to the fact that the producer does not wait for the leader to re-compress and append the data; it simply compresses messages and fires away. Since Snappy has very high compression speed and low CPU usage, a single producer is able to compress the same amount of messages much faster as compared to GZIP.
gzip  snappy  compression  kafka  streaming  ops 
april 2013 by jm
good blog post on histogram-estimation stream processing algorithms
After reviewing several dozen papers, a score or so in depth, I identified two data structures that appear to enable us to answer these recency and frequency queries: exponential histograms (from "Maintaining Stream Statistics Over Sliding Windows" by Datar et al.) and waves (from "Distributed Streams Algorithms for Sliding Windows" by Gibbons and Tirthapura). Both of these data structures are used to solve the so-called counting problem, the problem of determining, with a bound on the relative error, the number of 1s in the last N units of time. In other words, the data structures are able to answer the question: how many 1s appeared in the last n units of time within a factor of Error (e.g., 50%). The algorithms are neat, so I'll present them briefly.
streams  streaming  stream-processing  histograms  percentiles  estimation  waves  statistics  algorithms 
february 2013 by jm
Distributed Streams Algorithms for Sliding Windows [PDF]
'Massive data sets often arise as physically distributed, parallel data streams, and it is important to estimate various aggregates and statistics on the union of these streams. This paper presents algorithms for estimating aggregate
functions over a “sliding window” of the N most recent data items in one or more streams. [...] Our results are obtained using a novel family of synopsis data structures called waves.'
waves  papers  streaming  algorithms  percentiles  histogram  distcomp  distributed  aggregation  statistics  estimation  streams 
february 2013 by jm
HyperLogLog++: Google’s Take On Engineering HLL
Google and AggregateKnowledge's improvements to the HyperLogLog cardinality estimation algorithm
hyperloglog  cardinality  estimation  streaming  stream-processing  cep 
february 2013 by jm
'uses DNS witchcraft to allow you to access US/UK-only audio and video services like, BBC iPlayer, etc. without using a VPN or Web proxy.' According to , it proxies the initial connection setup and geo-auth, then mangles the stream address to stream directly, not via proxy. Sounds pretty useful
proxy  network  vpn  dns  tunnel  content  video  audio  iplayer  bbc  hulu  streaming  geo-restriction 
january 2013 by jm
Data distribution in the cloud with Node.js
Very interesting presentation from ex-IONAian Darach Ennis of Push Technology on eep.js, embedded event processing in Javascript for node.js stream processing. Handles tumbling, monotonic, periodic and sliding windows at 8-40 million events per second; no multi-dimensional, infinite or predicate event-processing windows. (via Sergio Bossa)
via:sbtourist  events  event-processing  streaming  data  ex-iona  darach-ennis  push-technology  cep  javascript  node.js  streams 
october 2012 by jm
_Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming_
Gunnar Kreitz' paper on its innards! 'Spotify is a music streaming service offering lowlatency access to a library of over 8 million music tracks.
Streaming is performed by a combination of client-server access
and a peer-to-peer protocol. In this paper, we give an overview
of the protocol and peer-to-peer architecture used and provide
measurements of service performance and user behavior.
The service currently has a user base of over 7 million and
has been available in six European countries since October 2008.
Data collected indicates that the combination of the client-server
and peer-to-peer paradigms can be applied to music streaming
with good results. In particular, 8.8% of music data played comes
from Spotify’s servers while the median playback latency is only
265 ms (including cached tracks). We also discuss the user access
patterns observed and how the peer-to-peer network affects the
access patterns as they reach the server.'
spotify  via:waxy  streaming  p2p  music  architecture  papers  networking 
june 2011 by jm
Gunnar Kreitz, _Spotify - Behind The Scenes_
the innards of Spotify's client, server fleet, and P2P layer, from the dev team themselves. good stuff
spotify  streaming  servers  networking  music  mp3  dns  p2p 
may 2011 by jm
Rumor: Google “Disgusted” With Record Labels
'Once again, Warner is the fly in the ointment, the same company that praises Spotify one day, renews their licenses for the rest of the world and then the next day doesn’t want to license them in the US.'
google  music  cloud  licensing  music-industry  record-labels  warner-music  streaming  from delicious
april 2011 by jm
Spotify Second Largest Source Of Revenue In Europe For Labels
wow. the WinAmp guys were right -- 'on a European level, Spotify is the second single largest source of revenue for record labels. This means that 2010 saw dramatic increase in its usage as well as payouts to record labels and artists themselves.' this via an IFPI report
ifpi  music  spotify  streaming  revenue  record-labels  europe  sweden  isps  mp3  from delicious
february 2011 by jm
Grooveshark Mobile for iPhone
nifty, an official app for this music-streaming site -- although for jailbroken iPhones only
jailbreaking  iphone  music  grooveshark  streaming  mp3  apps  from delicious
april 2010 by jm

related tags

aggregation  akamai  algorithms  analytics  apple  apple-music  apps  architecture  audio  avro  aws  azure  backpressure  bagging  bandwidth  batch  batching  bbc  best-practices  big-data  bloom-filters  caches  caching  capnproto  cardinality  cdn  cep  chrome  citusdb  classification  clever  cloud  cloudflare  cloudwatch  clustering  codecs  coding  comcast  compression  confluent  content  copyright  count-min  counting  ctf  cuckoo-filters  dado  darach-ennis  dash  data  data-pipelines  data-processing  data-structures  databases  dataflow  dc  decision-trees  design  distcomp  distributed  dns  dodgyboxes  dropbox  druid  dvo  dynamic-histograms  dynamodb  e-sports  ec2  ecj  estimation  eu  europe  event-processing  events  ex-iona  fail  failover  fault-tolerance  ffmpeg  firewalls  frequency  games  gaming  geo-restriction  go  google  gpu  greenwald-khanna  grooveshark  gzip  h264  hacking  hacks  hadoop  haproxy  heron  histogram  histograms  hll  hls  hoeffding-tree  http  http2  hulu  hyperloglog  ifpi  internet  iops  iphone  iplayer  iptables  isps  jailbreaking  java  javascript  jay-kreps  jun-rao  kafka  kcl  kinesis  kpl  lambda  lambda-architecture  learning  licensing  logging  logs  machine-learning  mapreduce  memory  merging  metrics  millwheel  minhash  ml  monitoring  mp3  mp4  mpeg  multi-az  multi-region  music  music-industry  net-neutrality  netflix  network  networking  nginx  node.js  nytimes  ofcom  olap  open-source  ops  overwatch  p2p  papers  peering  percentiles  performance  pig  pipelinedb  pipelines  piracy  players  postgres  probabilistic  product  production  protocols  proxy  push-technology  quantiles  querying  queueing  queues  rdd  realtime  record-labels  reliability  replication  revenue  s3  sado  samoa  samza  scala  scalability  scaling  schema  search  security  servers  sketches  sliding-window  snappy  sound  space-efficient  spark  spotify  sql  sqs  statistics  storage  storm  stream-processing  streaming  streams  studies  summingbird  sweden  television  terraform  terrible  tests  tips  toread  tsar  tunnel  tv  twitch  twitter  upstream  via:akohli  via:d2fn  via:sbtourist  via:tupp_ed  via:waxy  video  visual-programming  vpn  warner-music  wav  waves  web  wired  yahoo  youtube 

Copy this bookmark: