14503
Borg, Omega, and Kubernetes
Lessons learned from three container management systems over a decade.
kubernetes  paper 
10 hours ago
MillWheel: Fault-Tolerant Stream Processing at Internet Scale
MillWheel is a framework for building low-latency data-processing applications that is widely used at Google. Users specify a directed computation graph and application code for individual nodes, and the system manages persistent state and the continuous flow of records, all within the envelope of the framework's fault-tolerance guarantees. This paper describes MillWheel's programming model as well as its implementation. The case study of a continuous anomaly detector in use at Google serves to motivate how many of MillWheel's features are used. MillWheel's programming model provides a notion of logical time, making it simple to write time-based aggregations. MillWheel was designed from the outset with fault tolerance and scalability in mind. In practice, we find that MillWheel's unique combination of scalability, fault tolerance, and a versatile programming model lends itself to a wide variety of problems at Google.
big-data  papers 
yesterday
The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing
Unbounded, unordered, global-scale datasets are increasingly common in day-to-day business (e.g. Web logs, mobile usage statistics, and sensor networks). At the same time, consumers of these datasets have evolved sophisticated requirements, such as event-time ordering and windowing by features of the data themselves, in addition to an insatiable hunger for faster answers. Meanwhile, practicality dictates that one can never fully optimize along all dimensions of correctness, latency, and cost for these types of input. As a result, data processing practitioners are left with the quandary of how to reconcile the tensions between these seemingly competing propositions, often resulting in disparate implementations and systems. We propose that a fundamental shift of approach is necessary to deal with these evolved requirements in modern data processing. We as a field must stop trying to groom unbounded datasets into finite pools of information that eventually become complete, and instead live and breathe under the assumption that we will never know if or when we have seen all of our data, only that new data will arrive, old data may be retracted, and the only way to make this problem tractable is via principled abstractions that allow the practitioner the choice of appropriate tradeoffs along the axes of interest: correctness, latency, and cost. In this paper, we present one such approach, the Dataflow Model, along with a detailed examination of the semantics it enables, an overview of the core principles that guided its design, and a validation of the model itself via the real-world experiences that led to its development.
big-data  papers 
yesterday
Objective Bayesian Two Sample Hypothesis Testing for Online Controlled Experiments
As A/B testing gains wider adoption in the industry, more
people begin to realize the limitations of the traditional
frequentist null hypothesis statistical testing (NHST). The
large number of search results for the query “Bayesian A/B
testing” shows just how much the interest in the Bayesian
perspective is growing. In recent years there are also voices
arguing that Bayesian A/B testing should replace frequentist
NHST and is strictly superior in all aspects. Our goal
here is to clarify the myth by looking at both advantages and
issues of Bayesian methods. In particular, we propose an objective
Bayesian A/B testing framework for which we hope
to bring the best from Bayesian and frequentist methods
together. Unlike traditional methods, this method requires
the existence of historical A/B test data to objectively learn
a prior. We have successfully applied this method to Bing,
using thousands of experiments to establish the priors.
ab-testing  bayesian 
2 days ago
« earlier      
20090622 2_visit ab ab-testing activism airlines airlines-flights algorithms analysis angularjs-vs art asia auckland audio australia aws aws-lambda backup banking bayesian beijing_-_travel_-_what_to_do benchmark bittorrent blog blogging blogging_software blogs books bpamp brexit brokers burma business cache cambodia china cnn code community comparative_foreign_policy comparison computing course courses_2005fc courses_2007fc critique crystal_reports ctr culture data database design development distributed-computing dnn download downloads/software eclipse economics education email embeddings emr eu europe evaluation events example experience facebook fbwall finance flights forex free freelance freeware friends funny garmin gis golang google gps guide hardware hash hiring history hive howto ict4d ie imported individual_articles indonesia interest international international_phone_calling internet internet_applications investing ir java javascript-binary-data javascript-mvc javascript-templating javascript-testing javascript-ui-datatable job jobs jobs/study/professional_dev kafka kubernetes language ldap learning library linux local_web mail management map_sites maps media memory microfinance microsoft microsoft_word ml mobile money monitoring mp3 mp3_players music network networking neural news newzealand nginx nlp nz nz_politics_blogs online opensource optimization overview p2p papers parquet patterns perf perf-testing-theory performance philippines philosophy phone photography plugin podcast politics presto productivity programming prop_trading_systems psychology python query radio reactjs-vs recipes recovery recsys reference relationships relevance research resources review reviews rstats rust s3 scala science search security server service shopping skype slides social sociology sociology_of_media software solr spam spark sql statistics stats stopping_spam strategy study super symantec symbian sysadmin technology tensorflow testing text theory tips tools torrents trading trading_systems travel trump-election tuning tunnel tutorial ubuntu university_of_auckland utilities video visualization voip vs web web2.0 windows windows_xp/2003 word word2vec wordpress wordpress_wp_plugins writing xbmc

Copy this bookmark:



description:


tags: