jm + algorithms   217

A Mind Is Born
A C=64 demo in 256 bytes! Awesome work. Use of an LFSR number generator to create the melody is particularly clever (via Craig)
art  programming  computers  demos  demoscene  c-64  via:craig  lfsr  algorithms 
4 days ago by jm
'Mathwashing,' Facebook and the zeitgeist of data worship
Fred Benenson: Mathwashing can be thought of using math terms (algorithm, model, etc.) to paper over a more subjective reality. For example, a lot of people believed Facebook was using an unbiased algorithm to determine its trending topics, even if Facebook had previously admitted that humans were involved in the process.
maths  math  mathwashing  data  big-data  algorithms  machine-learning  bias  facebook  fred-benenson 
5 days ago by jm
Image comparison algorithms
Awesome StackOverflow answer for detecting "similar" images -- promising approach to reimplement ffffound's similarity feature in mltshp, maybe
algorithms  hashing  comparison  diff  images  similarity  search  ffffound  mltshp 
13 days ago by jm
HyperBitBit
jomsdev notes:

'Last year, in the AofA’16 conference Robert Sedgewick proposed a new algorithm for cardinality estimation. Robert Sedgwick is a professor at Princeton with a long track of publications on combinatorial/randomized algorithms. He was a good friend of Philippe Flajolet (creator of Hyperloglog) and HyperBitBit it's based on the same ideas. However, it uses less memory than Hyperloglog and can provide the same results. On practical data, HyperBitBit, for N < 2^64 estimates cardinality within 10% using only 128 + 6 bits.'
algorithms  programming  cs  hyperloglog  estimation  cardinality  counting  hyperbitbit 
4 weeks ago by jm
Artificial intelligence is ripe for abuse, tech researcher warns: 'a fascist's dream' | Technology | The Guardian
“We should always be suspicious when machine learning systems are described as free from bias if it’s been trained on human-generated data,” Crawford said. “Our biases are built into that training data.”

In the Chinese research it turned out that the faces of criminals were more unusual than those of law-abiding citizens. “People who had dissimilar faces were more likely to be seen as untrustworthy by police and judges. That’s encoding bias,” Crawford said. “This would be a terrifying system for an autocrat to get his hand on.” [...]

With AI this type of discrimination can be masked in a black box of algorithms, as appears to be the case with a company called Faceception, for instance, a firm that promises to profile people’s personalities based on their faces. In its own marketing material, the company suggests that Middle Eastern-looking people with beards are “terrorists”, while white looking women with trendy haircuts are “brand promoters”.
bias  ai  racism  politics  big-data  technology  fascism  crime  algorithms  faceception  discrimination  computer-says-no 
6 weeks ago by jm
[1606.08813] European Union regulations on algorithmic decision-making and a "right to explanation"
We summarize the potential impact that the European Union's new General Data Protection Regulation will have on the routine use of machine learning algorithms. Slated to take effect as law across the EU in 2018, it will restrict automated individual decision-making (that is, algorithms that make decisions based on user-level predictors) which "significantly affect" users. The law will also effectively create a "right to explanation," whereby a user can ask for an explanation of an algorithmic decision that was made about them. We argue that while this law will pose large challenges for industry, it highlights opportunities for computer scientists to take the lead in designing algorithms and evaluation frameworks which avoid discrimination and enable explanation.


oh this'll be tricky.
algorithms  accountability  eu  gdpr  ml  machine-learning  via:daveb  europe  data-protection  right-to-explanation 
6 weeks ago by jm
US immigration asking tech interview trivia questions now
what the absolute fuck. Celestine Omin on Twitter: "I was just asked to balance a Binary Search Tree by JFK's airport immigration. Welcome to America."
twitter  celestine-omin  us-politics  immigration  tests  interviews  bst  trees  data-structures  algorithms 
8 weeks ago by jm
tdunning/t-digest
A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means. The t-digest algorithm is also very parallel friendly making it useful in map-reduce and parallel streaming applications.

The t-digest construction algorithm uses a variant of 1-dimensional k-means clustering to product a data structure that is related to the Q-digest. This t-digest data structure can be used to estimate quantiles or compute other rank statistics. The advantage of the t-digest over the Q-digest is that the t-digest can handle floating point values while the Q-digest is limited to integers. With small changes, the t-digest can handle any values from any ordered set that has something akin to a mean. The accuracy of quantile estimates produced by t-digests can be orders of magnitude more accurate than those produced by Q-digests in spite of the fact that t-digests are more compact when stored on disk.


Super-nice feature is that it's mergeable, so amenable to parallel usage across multiple hosts if required. Java implementation, ASL licensing.
data-structures  algorithms  java  t-digest  statistics  quantiles  percentiles  aggregation  digests  estimation  ranking 
december 2016 by jm
Fast Forward Labs: Probabilistic Data Structure Showdown: Cuckoo Filters vs. Bloom Filters
Nice comparison of a counting Bloom filter and a Cuckoo Filter, implemented in Python:
This post provides an update by exploring Cuckoo filters, a new probabilistic data structure that improves upon the standard Bloom filter. The Cuckoo filter provides a few advantages: 1) it enables dynamic deletion and addition of items 2) it can be easily implemented compared to Bloom filter variants with similar capabilities, and 3) for similar space constraints, the Cuckoo filter provides lower false positives, particularly at lower capacities. We provide a python implementation of the Cuckoo filter here, and compare it to a counting Bloom filter (a Bloom filter variant).
algorithms  probabilistic  approximation  bloom-filters  cuckoo-filters  sets  estimation  python 
november 2016 by jm
Accidentally Quadratic — Rust hash iteration+reinsertion
It was recently discovered that some surprising operations on Rust’s standard hash table types could go quadratic.


Quite a nice unexpected accidental detour into O(n^2)
big-o  hashing  robin-hood-hashing  siphash  algorithms  hashtables  rust 
november 2016 by jm
Jeff Erickson's Algorithms, Etc.
This page contains lecture notes and other course materials for various algorithms classes I have taught at the University of Illinois, Urbana-Champaign. The notes are numbered in the order I cover the material in a typical undergraduate class, wtih notes on more advanced material (indicated by the symbol ♥) interspersed appropriately. [...] In addition to the algorithms notes I have been maintaining since 1999, this page also contains new notes on "Models of Computation", which cover a small subset of the material normally taught in undergraduate courses in formal languages and automata. I wrote these notes for a new junior-level course on "Algorithms and Models of Computation" that Lenny Pitt and I developed, which is now required for all undergraduate computer science and computer engineering majors at UIUC.


Via Tony Finch
via:fanf  book  cs  algorithms  jeff-erickson  uiuc 
november 2016 by jm
MemC3: Compact and concurrent Memcache with dumber caching and smarter hashing
An improved hashing algorithm called optimistic cuckoo hashing, and a CLOCK-based eviction algorithm that works in tandem with it. They are evaluated in the context of Memcached, where combined they give up to a 30% memory usage reduction and up to a 3x improvement in queries per second as compared to the default Memcached implementation on read-heavy workloads with small objects (as is typified by Facebook workloads).
memcached  performance  key-value-stores  storage  databases  cuckoo-hashing  algorithms  concurrency  caching  cache-eviction  memory  throughput 
november 2016 by jm
Facebook scuppers Admiral Insurance plan to base premiums on your posts
Well, this is amazingly awful:
The Guardian claims to have further details of the kind of tell-tale signs that Admiral's algorithmic analysis would have looked out for in Facebook posts. Good traits include "writing in short concrete sentences, using lists, and arranging to meet friends at a set time and place, rather than just 'tonight'." On the other hand, "evidence that the Facebook user might be overconfident—such as the use of exclamation marks and the frequent use of 'always' or 'never' rather than 'maybe'—will count against them."


The future is shitty.
insurance  facebook  scoring  computer-says-no  algorithms  text-analysis  awful  future 
november 2016 by jm
Here's Why Facebook's Trending Algorithm Keeps Promoting Fake News - BuzzFeed News
Kalina Bontcheva leads the EU-funded PHEME project working to compute the veracity of social media content. She said reducing the amount of human oversight for Trending heightens the likelihood of failures, and of the algorithm being fooled by people trying to game it.
“I think people are always going to try and outsmart these algorithms — we’ve seen this with search engine optimization,” she said. “I’m sure that once in a while there is going to be a very high-profile failure.”
Less human oversight means more reliance on the algorithm, which creates a new set of concerns, according to Kate Starbird, an assistant professor at the University of Washington who has been using machine learning and other technology to evaluate the accuracy of rumors and information during events such as the Boston bombings.
“[Facebook is] making an assumption that we’re more comfortable with a machine being biased than with a human being biased, because people don’t understand machines as well,” she said.
facebook  news  gaming  adversarial-classification  pheme  truth  social-media  algorithms  ml  machine-learning  media 
october 2016 by jm
Remarks at the SASE Panel On The Moral Economy of Tech
Excellent talk. I love this analogy for ML applied to real-world data which affects people:
Treating the world as software promotes fantasies of control. And the best kind of control is control without responsibility. Our unique position as authors of software used by millions gives us power, but we don't accept that this should make us accountable. We're programmers—who else is going to write the software that runs the world? To put it plainly, we are surprised that people seem to get mad at us for trying to help. Fortunately we are smart people and have found a way out of this predicament. Instead of relying on algorithms, which we can be accused of manipulating for our benefit, we have turned to machine learning, an ingenious way of disclaiming responsibility for anything. Machine learning is like money laundering for bias. It's a clean, mathematical apparatus that gives the status quo the aura of logical inevitability. The numbers don't lie.


Particularly apposite today given Y Combinator's revelation that they use an AI bot to help 'sift admission applications', and don't know what criteria it's using: https://twitter.com/aprjoy/status/783032128653107200
culture  ethics  privacy  technology  surveillance  ml  machine-learning  bias  algorithms  software  control 
october 2016 by jm
[net-next,14/14] tcp_bbr: add BBR congestion control
This commit implements a new TCP congestion control algorithm: BBR
(Bottleneck Bandwidth and RTT). A detailed description of BBR will be
published in ACM Queue, Vol. 14 No. 5, September-October 2016, as
"BBR: Congestion-Based Congestion Control".

BBR has significantly increased throughput and reduced latency for
connections on Google's internal backbone networks and google.com and
YouTube Web servers.

BBR requires only changes on the sender side, not in the network or
the receiver side. Thus it can be incrementally deployed on today's
Internet, or in datacenters. [....]

Signed-off-by: Van Jacobson <vanj@google.com>
google  linux  tcp  ip  congestion-control  bufferbloat  patches  algorithms  rtt  bandwidth  youtube  via:bradfitz 
september 2016 by jm
Highly Available Counters Using Cassandra
solid discussion of building HA counters using CRDTs and similar eventually-consistent data structures
crdts  algorithms  data-structures  cassandra  ha  counters 
september 2016 by jm
Algorithmic management as the new Taylorism
'its legacy can be seen in factories, call centres and warehouses today, although new technology has taken the place of Taylor’s instruction cards and stopwatches. Many warehouse workers for companies such as Amazon use handheld devices that give them step-by-step instructions on where to walk and what to pick from the shelves when they get there, all the while measuring their “pick rate” in real time. For Jeremias Prassl, a law professor at Oxford university, the algorithmic management techniques of Uber and Deliveroo are Taylorism 2.0. “Algorithms are providing a degree of control and oversight that even the most hardened Taylorists could never have dreamt of,” he says.'
algorithms  labour  work  labor  taylorism  management  silicon-valley  tech  deliveroo  uber  piece-work 
september 2016 by jm
Ratas - A hierarchical timer wheel
excellent explanation and benchmarks of a timer wheel implementation
timer-wheels  timing-wheels  algorithms  c  linux  timers  data-structures 
august 2016 by jm
A fast alternative to the modulo reduction
(x * N) div 2^32 is an equally fair map reduction, but faster on modern 64-bit CPUs
fairness  modulo  arithmetic  algorithms  fair-mapping  reduce  daniel-lemire 
june 2016 by jm
In Wisconsin, a Backlash Against Using Data to Foretell Defendants’ Futures - The New York Times
More trial-by-algorithm horrors:
Company officials say the algorithm’s results are backed by research, but they are tight-lipped about its details. They do acknowledge that men and women receive different assessments, as do juveniles, but the factors considered and the weight given to each are kept secret.

“The key to our product is the algorithms, and they’re proprietary,” said Jeffrey Harmon, Northpointe’s general manager. “We’ve created them, and we don’t release them because it’s certainly a core piece of our business. It’s not about looking at the algorithms. It’s about looking at the outcomes.”

That secrecy is at the heart of Mr. Loomis’s lawsuit. His lawyer, Michael D. Rosenberg, who declined to be interviewed because of the pending appeal, argued that Mr. Loomis should be able to review the algorithm and make arguments about its validity as part of his defense. He also challenges the use of different scales for each sex.

The Compas system, Mr. Rosenberg wrote in his brief, “is full of holes and violates the requirement that a sentence be individualized.”
ethics  compas  sentencing  wisconsin  northpointe  law  trial-by-algorithm  algorithms 
june 2016 by jm
Differential Privacy
Apple have announced they plan to use it; Google use a DP algorithm called RAPPOR in Chrome usage statistics. In summary: "novel privacy technology that allows inferring statistics about populations while preserving the privacy of individual users".
apple  privacy  anonymization  google  rappor  algorithms  sampling  populations  statistics  differential-privacy 
june 2016 by jm
The tyranny of the algorithm yet again...
Paypal will no longer handle payments if the user's address includes the word "Isis":
That these place names exist won't be a surprise to anyone familiar with English limnology - the study of rivers and inland waters. As Wikipedia helpfully tells us, "The Isis is the name given to the part of the River Thames above Iffley Lock which flows through the university city of Oxford". In at least one local primary school I'm familiar with, the classes are called Windrush, Cherwell, Isis and Thames.

[...] Now PayPal has decided that they are not prepared to facilitate payments for goods to be delivered to an address which includes the word "Isis". An Isis street resident ran into some unexpected difficulties when attempting to purchase a small quantity of haberdashery on the internet with the aid of a PayPal account. The transaction would not process. In puzzlement she eventually got irritated enough to brave the 24/7 customer support telephone tag labyrinth. The short version of the response from the eventual real person she managed to get through to was that PayPal have blacklisted addresses which include the name "Isis". They will not process payments for goods to be delivered to an Isis related address, whatever state of privileged respectability the residents of such properties may have earned or inherited in their lifetimes to this point.


One has to wonder if this also brings the risk of adding the user to a secret list, somewhere. Trial by algorithm.
isis  algorithms  automation  fail  law-enforcement  paypal  uk  rivers 
june 2016 by jm
Social Network Algorithms Are Distorting Reality By Boosting Conspiracy Theories | Co.Exist | ideas + impact
In his 1962 book, The Image: A Guide to Pseudo-Events in America, former Librarian of Congress Daniel J. Boorstin describes a world where our ability to technologically shape reality is so sophisticated, it overcomes reality itself. "We risk being the first people in history," he writes, "to have been able to make their illusions so vivid, so persuasive, so ‘realistic’ that they can live in them."
algorithms  facebook  ethics  filtering  newsfeed  conspiracy-theories  twitter  viral  crazy 
may 2016 by jm
Darts, Dice, and Coins
Earlier this year, I asked a question on Stack Overflow about a data structure for loaded dice. Specifically, I was interested in answering this question: "You are given an n-sided die where side i has probability pi of being rolled. What is the most efficient data structure for simulating rolls of the die?"

This data structure could be used for many purposes. For starters, you could use it to simulate rolls of a fair, six-sided die by assigning probability 1616 to each of the sides of the die, or a to simulate a fair coin by simulating a two-sided die where each side has probability 1212 of coming up. You could also use this data structure to directly simulate the total of two fair six-sided dice being thrown by having an 11-sided die (whose faces were 2, 3, 4, ..., 12), where each side was appropriately weighted with the probability that this total would show if you used two fair dice. However, you could also use this data structure to simulate loaded dice. For example, if you were playing craps with dice that you knew weren't perfectly fair, you might use the data structure to simulate many rolls of the dice to see what the optimal strategy would be. You could also consider simulating an imperfect roulette wheel in the same way.

Outside the domain of game-playing, you could also use this data structure in robotics simulations where sensors have known failure rates. For example, if a range sensor has a 95% chance of giving the right value back, a 4% chance of giving back a value that's too small, and a 1% chance of handing back a value that's too large, you could use this data structure to simulate readings from the sensor by generating a random outcome and simulating the sensor reading in that case.

The answer I received on Stack Overflow impressed me for two reasons. First, the solution pointed me at a powerful technique called the alias method that, under certain reasonable assumptions about the machine model, is capable of simulating rolls of the die in O(1)O(1) time after a simple preprocessing step. Second, and perhaps more surprisingly, this algorithm has been known for decades, but I had not once encountered it! Considering how much processing time is dedicated to simulation, I would have expected this technique to be better- known. A few quick Google searches turned up a wealth of information on the technique, but I couldn't find a single site that compiled together the intuition and explanation behind the technique.


(via Marc Brooker)
via:marcbrooker  algorithms  probability  algorithm  coding  data-structures  alias  dice  random 
april 2016 by jm
Rendezvous hashing - Wikipedia, the free encyclopedia

Rendezvous or Highest Random Weight (HRW) hashing[1][2] is an algorithm that allows clients to achieve distributed agreement on a set of k options out of a possible set of n options. A typical application is when clients need to agree on which sites (or proxies) objects are to assigned to. When k is 1, it subsumes the goals of consistent hashing, using an entirely different method.
hrw  hashing  hashes  consistent-hashing  rendezvous-hashing  algorithms  discovery  distributed-computing 
april 2016 by jm
BLAKE2: simpler, smaller, fast as MD5
'We present the cryptographic hash function BLAKE2, an improved version
of the SHA-3 finalist BLAKE optimized for speed in software. Target applications include
cloud storage, intrusion detection, or version control systems. BLAKE2 comes
in two main flavors: BLAKE2b is optimized for 64-bit platforms, and BLAKE2s for
smaller architectures. On 64-bit platforms, BLAKE2 is often faster than MD5, yet provides
security similar to that of SHA-3. We specify parallel versions BLAKE2bp and
BLAKE2sp that are up to 4 and 8 times faster, by taking advantage of SIMD and/or
multiple cores. BLAKE2 has more benefits than just speed: BLAKE2 uses up to 32%
less RAM than BLAKE, and comes with a comprehensive tree-hashing mode as well
as an efficient MAC mode.'
crypto  hash  blake2  hashing  blake  algorithms  sha1  sha3  simd  performance  mac 
april 2016 by jm
“Racist algorithms” and learned helplessness
Whenever I’ve had to talk about bias in algorithms, I’ve tried be  careful to emphasize that it’s not that we shouldn’t use algorithms in search, recommendation and decision making. It’s that we often just don’t know how they’re making their decisions to present answers, make recommendations or arrive at conclusions, and it’s this lack of transparency that’s worrisome. Remember, algorithms aren’t just code.

What’s also worrisome is the amplifier effect. Even if “all an algorithm is doing” is reflecting and transmitting biases inherent in society, it’s also amplifying and perpetuating them on a much larger scale than your friendly neighborhood racist. And that’s the bigger issue. [...] even if the algorithm isn’t creating bias, it’s creating a feedback loop that has powerful perception effects.
feedback  bias  racism  algorithms  software  systems  society 
april 2016 by jm
Elias gamma coding
'used most commonly when coding integers whose upper-bound cannot be determined beforehand.'
data-structures  algorithms  elias-gamma-coding  encoding  coding  numbers  integers 
april 2016 by jm
Hashed Wheel Timer
nice java impl of this efficient data structure, broken out from Project Reactor
scalability  java  timers  hashed-wheel-timers  algorithms  data-structures 
march 2016 by jm
Hey Microsoft, the Internet Made My Bot Racist, Too
All machine learning algorithms strive to exaggerate and perpetuate the past. That is, after all, what they are learning from. The fundamental assumption of every machine learning algorithm is that the past is correct, and anything coming in the future will be, and should be, like the past. This is a fine assumption to make when you are Netflix trying to predict what movie you’ll like, but is immoral when applied to many other situations. For bots like mine and Microsoft’s, built for entertainment purposes, it can lead to embarrassment. But AI has started to be used in much more meaningful ways: predictive policing in Chicago, for example, has already led to widespread accusations of racial profiling.
This isn’t a little problem. This is a huge problem, and it demands a lot more attention then it’s getting now, particularly in the community of scientists and engineers who design and apply these algorithms. It’s one thing to get cursed out by an AI, but wholly another when one puts you in jail, denies you a mortgage, or decides to audit you.
machine-learning  ml  algorithms  future  society  microsoft 
march 2016 by jm
Conversant ConcurrentQueue and Disruptor BlockingQueue
'Disruptor is the highest performing intra-thread transfer mechanism available in Java. Conversant Disruptor is the highest performing implementation of this type of ring buffer queue because it has almost no overhead and it exploits a particularly simple design.

Conversant has been using this in production since 2012 and the performance is excellent. The BlockingQueue implementation is very stable, although we continue to tune and improve it. The latest release, 1.2.4, is 100% production ready.

Although we have been working on it for a long time, we decided to open source our BlockingQueue this year to contribute something back to the community. ... its a drop in for BlockingQueue, so its a very easy test. Conversant Disruptor will crush ArrayBlockingQueue and LinkedTransferQueue for thread to thread transfers.

In our system, we noticed a 10-20% reduction in overall system load and latency when we introduced it.'
disruptor  blocking-queues  queues  queueing  data-structures  algorithms  java  conversant  concurrency  performance 
march 2016 by jm
How to do distributed locking
A critique of the "Redlock" locking algorithm from Redis by Martin Kleppman. antirez responds here: http://antirez.com/news/101 ; summary of followups: https://storify.com/martinkl/redlock-discussion
distributed  locking  redis  algorithms  coding  distcomp  redlock  martin-kleppman  zookeeper 
february 2016 by jm
research!rsc: Zip Files All The Way Down
quine.zip, quine.gz, and quine.tar.gz. Here's what happens when you mail it through bad AV software: https://twitter.com/FioraAeterna/status/694655296707297281
zip  algorithms  compression  quines  fun  hacks  gzip 
february 2016 by jm
Very Fast Reservoir Sampling
via Tony Finch. 'In this post I will demonstrate how to do reservoir sampling orders of magnitude faster than the traditional “naive” reservoir sampling algorithm, using a fast high-fidelity approximation to the reservoir sampling-gap distribution.'
statistics  reservoir-sampling  sampling  algorithms  poisson  bernoulli  performance 
december 2015 by jm
[LUCENE-6917] Deprecate and rename NumericField/RangeQuery to LegacyNumeric - ASF JIRA
Interesting performance-related tweak going into Lucene -- based on the Bkd-Tree I think: https://users.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf . Being used for all numeric index types, not just multidimensional ones?
lucene  performance  algorithms  patches  bkd-trees  geodata  numeric  indexing 
december 2015 by jm
ELS: latency based load balancer, part 1
ELS measures the following things:

Success latency and success rate of each machine;
Number of outstanding requests between the load balancer and each machine. These are the requests that have been sent out but we haven’t yet received a reply;
Fast failures are better than slow failures, so we also measure failure latency for each machine.

Since users care a lot about latency, we prefer machines that are expected to answer quicker. ELS therefore converts all the measured metrics into expected latency from the client’s perspective.[...]

In short, the formula ensures that slower machines get less traffic and failing machines get much less traffic. Slower and failing machines still get some traffic, because we need to be able to detect when they come back up again.
latency  spotify  proxies  load-balancing  els  algorithms  c3  round-robin  load-balancers  routing 
december 2015 by jm
Accretion Disc Series - Clint Fulkerson
available as prints -- vector art with a hint of the bacterial
algorithms  art  graphics  vector  bacteria  petri-dish  clint-fulkerson 
november 2015 by jm
Cache-friendly binary search
by reordering items to optimize locality. Via aphyr's dad!
caches  cache-friendly  optimization  data-locality  performance  coding  algorithms 
november 2015 by jm
Caffeine cache adopts Window TinyLfu eviction policy
'Caffeine is a Java 8 rewrite of Guava's cache. In this version we focused on improving the hit rate by evaluating alternatives to the classic least-recenty-used (LRU) eviction policy. In collaboration with researchers at Israel's Technion, we developed a new algorithm that matches or exceeds the hit rate of the best alternatives (ARC, LIRS). A paper of our work is being prepared for publication.'

Specifically:
W-TinyLfu uses a small admission LRU that evicts to a large Segmented LRU if accepted by the TinyLfu admission policy. TinyLfu relies on a frequency sketch to probabilistically estimate the historic usage of an entry. The window allows the policy to have a high hit rate when entries exhibit a high temporal / low frequency access pattern which would otherwise be rejected. The configuration enables the cache to estimate the frequency and recency of an entry with low overhead. This implementation uses a 4-bit CountMinSketch, growing at 8 bytes per cache entry to be accurate. Unlike ARC and LIRS, this policy does not retain non-resident keys.
tinylfu  caches  caching  cache-eviction  java8  guava  caffeine  lru  count-min  sketching  algorithms 
november 2015 by jm
How Facebook avoids failures
Great paper from Ben Maurer of Facebook in ACM Queue.
A "move-fast" mentality does not have to be at odds with reliability. To make these philosophies compatible, Facebook's infrastructure provides safety valves.


This is full of interesting techniques.

* Rapidly deployed configuration changes: Make everybody use a common configuration system; Statically validate configuration changes; Run a canary; Hold on to good configurations; Make it easy to revert.

* Hard dependencies on core services: Cache data from core services. Provide hardened APIs. Run fire drills.

* Increased latency and resource exhaustion: Controlled Delay (based on the anti-bufferbloat CoDel algorithm -- this is really cool); Adaptive LIFO (last-in, first-out) for queue busting; Concurrency Control (essentially a form of circuit breaker).

* Tools that Help Diagnose Failures: High-Density Dashboards with Cubism (horizon charts); What just changed?

* Learning from Failure: the DERP (!) methodology,
ben-maurer  facebook  reliability  algorithms  codel  circuit-breakers  derp  failure  ops  cubism  horizon-charts  charts  dependencies  soa  microservices  uptime  deployment  configuration  change-management 
november 2015 by jm
Apache Kafka, Purgatory, and Hierarchical Timing Wheels
In the new design, we use Hierarchical Timing Wheels for the timeout timer and DelayQueue of timer buckets to advance the clock on demand. Completed requests are removed from the timer queue immediately with O(1) cost. The buckets remain in the delay queue, however, the number of buckets is bounded. And, in a healthy system, most of the requests are satisfied before timeout, and many of the buckets become empty before pulled out of the delay queue. Thus, the timer should rarely have the buckets of the lower interval. The advantage of this design is that the number of requests in the timer queue is the number of pending requests exactly at any time. This allows us to estimate the number of requests need to be purged. We can avoid unnecessary purge operation of the watcher lists. As the result we achieve a higher scalability in terms of request rate with much better CPU usage.
algorithms  timers  kafka  scheduling  timing-wheels  delayqueue  queueing 
october 2015 by jm
The New InfluxDB Storage Engine: A Time Structured Merge Tree
The new engine has similarities with LSM Trees (like LevelDB and Cassandra’s underlying storage). It has a write ahead log, index files that are read only, and it occasionally performs compactions to combine index files. We’re calling it a Time Structured Merge Tree because the index files keep contiguous blocks of time and the compactions merge those blocks into larger blocks of time. Compression of the data improves as the index files are compacted. Once a shard becomes cold for writes it will be compacted into as few files as possible, which yield the best compression.
influxdb  storage  lsm-trees  leveldb  tsm-trees  data-structures  algorithms  time-series  tsd  compression 
october 2015 by jm
Introduction to HDFS Erasure Coding in Apache Hadoop
How Hadoop did EC. Erasure Coding support ("HDFS-EC") is set to be released in Hadoop 3.0 apparently
erasure-coding  reed-solomon  algorithms  hadoop  hdfs  cloudera  raid  storage 
september 2015 by jm
Brotli: a new compression algorithm for the internet from Google
While Zopfli is Deflate-compatible, Brotli is a whole new data format. This new format allows us to get 20–26% higher compression ratios over Zopfli. In our study ‘Comparison of Brotli, Deflate, Zopfli, LZMA, LZHAM and Bzip2 Compression Algorithms’ we show that Brotli is roughly as fast as zlib’s Deflate implementation. At the same time, it compresses slightly more densely than LZMA and bzip2 on the Canterbury corpus. The higher data density is achieved by a 2nd order context modeling, re-use of entropy codes, larger memory window of past data and joint distribution codes. Just like Zopfli, the new algorithm is named after Swiss bakery products. Brötli means ‘small bread’ in Swiss German.
brotli  zopfli  deflate  gzip  compression  algorithms  swiss  google 
september 2015 by jm
The Pixel Factory
amazing slideshow/WebGL demo talking about graphics programming, its maths, and GPUs
maths  graphics  webgl  demos  coding  algorithms  slides  tflops  gpus 
september 2015 by jm
Algorithmist
The Algorithmist is a resource dedicated to anything algorithms - from the practical realm, to the theoretical realm. There are also links and explanation to problemsets.


A wiki for algorithms. Not sure if this is likely to improve on Wikipedia, which of course covers the same subject matter quite well, though
algorithms  reference  wikis  coding  data-structures 
september 2015 by jm
Mining High-Speed Data Streams: The Hoeffding Tree Algorithm
This paper proposes a decision tree learner for data streams, the Hoeffding Tree algorithm, which comes with the guarantee that the learned decision tree is asymptotically nearly identical to that of a non-incremental learner using infinitely many examples. This work constitutes a significant step in developing methodology suitable for modern ‘big data’ challenges and has initiated a lot of follow-up research. The Hoeffding Tree algorithm has been covered in various textbooks and is available in several public domain tools, including the WEKA Data Mining platform.
hoeffding-tree  algorithms  data-structures  streaming  streams  cep  decision-trees  ml  learning  papers 
august 2015 by jm
Sorting out graph processing
Some nice real-world experimentation around large-scale data processing in differential dataflow:
If you wanted to do an iterative graph computation like PageRank, it would literally be faster to sort the edges from scratch each and every iteration, than to use unsorted edges. If you want to do graph computation, please sort your edges.

Actually, you know what: if you want to do any big data computation, please sort your records. Stop talking sass about how Hadoop sorts things it doesn't need to, read some papers, run some tests, and then sort your damned data. Or at least run faster than me when I sort your data for you.
algorithms  graphs  coding  data-processing  big-data  differential-dataflow  radix-sort  sorting  x-stream  counting-sort  pagerank 
august 2015 by jm
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Extremely authoritative slide deck on building a recommendation system, from Xavier Amatriain, Research/Engineering Manager at Netflix
netflix  recommendations  recommenders  ml  machine-learning  cmu  clustering  algorithms 
august 2015 by jm
choco-solver.org
Choco is [FOSS] dedicated to Constraint Programming[2]. It is a Java library written under BSD license. It aims at describing hard combinatorial problems in the form of Constraint Satisfaction Problems and solving them with Constraint Programming techniques. The user models its problem in a declarative way by stating the set of constraints that need to be satisfied in every solution. Then, Choco solves the problem by alternating constraint filtering algorithms with a search mechanism. [...]
Choco is among the fastest CP solvers on the market. In 2013 and 2014, Choco has been awarded many medals at the MiniZinc challenge that is the world-wide competition of constraint-programming solvers.
choco  constraint-programming  solving  search  combinatorial  algorithms 
august 2015 by jm
"last seen" sketch
a new sketch algorithm from Baron Schwartz and Preetam Jinka of VividCortex; similar to Count-Min but with last-seen timestamp instead of frequency.
sketch  algorithms  estimation  approximation  sampling  streams  big-data 
july 2015 by jm
Levenshtein automata can be simple and fast
Nice algorithm for fuzzy text search with a limited Levenshtein edit distance using a DFA
dfa  algorithms  levenshtein  text  edit-distance  fuzzy-search  search  python 
june 2015 by jm
Hybrid Logical Clocks
neat substitute for physical-time clocks in synchronization and ordering in a distributed system, based on Lamport's Logical Clocks and Google's TrueTime.

'HLC captures the causality relationship like LC, and enables easy identification of consistent snapshots in distributed systems. Dually, HLC can be used in lieu of PT clocks since it maintains its logical clock to be always close to the PT clock.'
hlc  clocks  logical-clocks  time  synchronization  ordering  events  logs  papers  algorithms  truetime  distcomp 
june 2015 by jm
The Violence of Algorithms: Why Big Data Is Only as Smart as Those Who Generate It
The modern state system is built on a bargain between governments and citizens. States provide collective social goods, and in turn, via a system of norms, institutions, regulations, and ethics to hold this power accountable, citizens give states legitimacy. This bargain created order and stability out of what was an increasingly chaotic global system. If algorithms represent a new ungoverned space, a hidden and potentially ever-evolving unknowable public good, then they are an affront to our democratic system, one that requires transparency and accountability in order to function. A node of power that exists outside of these bounds is a threat to the notion of collective governance itself. This, at its core, is a profoundly undemocratic notion—one that states will have to engage with seriously if they are going to remain relevant and legitimate to their digital citizenry who give them their power.
palantir  algorithms  big-data  government  democracy  transparency  accountability  analytics  surveillance  war  privacy  protest  rights 
june 2015 by jm
Top 10 data mining algorithms in plain English
This is a phenomenally useful ML/data-mining resource post -- 'the top 10 most influential data mining algorithms as voted on by 3 separate panels in [ICDM '06's] survey paper', but with a nice clear intro and description for each one. Here's the algorithms covered:
1. C4.5
2. k-means
3. Support vector machines
4. Apriori
5. EM
6. PageRank
7. AdaBoost
8. kNN
9. Naive Bayes
10. CART
svm  k-means  c4.5  apriori  em  pagerank  adaboost  knn  naive-bayes  cart  ml  data-mining  machine-learning  papers  algorithms  unsupervised  supervised 
may 2015 by jm
Memory Layouts for Binary Search
Key takeaway:
Nearly uni­ver­sally, B-trees win when the data gets big enough.
caches  cpu  performance  optimization  memory  binary-search  b-trees  algorithms  search  memory-layout 
may 2015 by jm
The Injector: A new Executor for Java
This honestly fits a narrow niche, but one that is gaining in popularity. If your messages take > 100μs to process, or your worker threads are consistently saturated, the standard ThreadPoolExecutor is likely perfectly adequate for your needs. If, on the other hand, you’re able to engineer your system to operate with one application thread per physical core you are probably better off looking at an approach like the LMAX Disruptor. However, if you fall in the crack in between these two scenarios, or are seeing a significant portion of time spent in futex calls and need a drop in ExecutorService to take the edge off, the injector may well be worth a look.
performance  java  executor  concurrency  disruptor  algorithms  coding  threads  threadpool  injector 
may 2015 by jm
How to do named entity recognition: machine learning oversimplified
Good explanation of this NLP tokenization/feature-extraction technique. Example result: "Jimi/B-PER Hendrix/I-PER played/O at/O Woodstock/B-LOC ./O"
named-entities  feature-extraction  tokenization  nlp  ml  algorithms  machine-learning 
may 2015 by jm
"Trash Day: Coordinating Garbage Collection in Distributed Systems"
Another GC-coordination strategy, similar to Blade (qv), with some real-world examples using Cassandra
blade  via:adriancolyer  papers  gc  distsys  algorithms  distributed  java  jvm  latency  spark  cassandra 
may 2015 by jm
hyperlogsandwich
A probabilistic data structure for frequency/k-occurrence cardinality estimation of multisets. Sample implementation


(via Patrick McFadin)
via:patrickmcfadin  hyperloglog  cardinality  data-structures  algorithms  hyperlogsandwich  counting  estimation  lossy  multisets 
may 2015 by jm
Rob Pike's 5 rules of optimization
these are great. I've run into rule #3 ("fancy algorithms are slow when n is small, and n is usually small") several times...
twitter  rob-pike  via:igrigorik  coding  rules  laws  optimization  performance  algorithms  data-structures  aphorisms 
april 2015 by jm
SEC vs April Fool's Day
I think that materiality means what it says, and if people or algorithms do dumb things with trivial information that's their problem. But markets are a lot faster and more literal than they were when the materiality standard was created, and I wonder whether regulators or courts will one day decide that materiality is too reasonable a standard for modern markets. The materiality standard depends on the reasonable investor, and in many important contexts the reasonable investor has been replaced by a computer. 
algorithms  trading  stock  stock-market  sec  materiality  april-fools-day  tesla  investing  jokes 
april 2015 by jm
Bug Prediction at Google
LOL. grepping commit logs for /bug|fix/ does the job, apparently:
In the literature, Rahman et al. found that a very cheap algorithm actually performs almost as well as some very expensive bug-prediction algorithms. They found that simply ranking files by the number of times they've been changed with a bug-fixing commit (i.e. a commit which fixes a bug) will find the hot spots in a code base. Simple! This matches our intuition: if a file keeps requiring bug-fixes, it must be a hot spot because developers are clearly struggling with it.
bugs  rahman-algorithm  heuristics  source-code-analysis  coding  algorithms  google  static-code-analysis  version-control 
march 2015 by jm
Explanation of the Jump Consistent Hash algorithm
I blogged about the amazing stateless Jump Consistent Hash algorithm last year, but this is a good walkthrough of how it works.

Apparently one author, Eric Veach, is legendary -- https://news.ycombinator.com/item?id=9209891 : "Eric Veach is huge in the computer graphics world for laying a ton of the foundations of modern physically based rendering in his PhD thesis [1]. He then went on to work for Pixar and did a ton of work on Renderman (for which he recently got an Academy Award), and then in the early 2000ish left Pixar to go work for Google, where he was the lead on developing AdWords [2]. In short, he's had quite a career, and seeing a new paper from him is always interesting."
eric-veach  consistent-hashing  algorithms  google  adwords  renderman  pixar  history  coding  c  c++ 
march 2015 by jm
"Cuckoo Filter: Practically Better Than Bloom"
'We propose a new data structure called the cuckoo filter that can replace Bloom filters for approximate set membership
tests. Cuckoo filters support adding and removing items dynamically while achieving even higher performance than
Bloom filters. For applications that store many items and target moderately low false positive rates, cuckoo filters have
lower space overhead than space-optimized Bloom filters. Our experimental results also show that cuckoo filters outperform previous data structures that extend Bloom filters to support deletions substantially in both time and space.'
algorithms  paper  bloom-filters  cuckoo-filters  cuckoo-hashing  data-structures  false-positives  big-data  probabilistic  hashing  set-membership  approximation 
march 2015 by jm
VividCortex uses K-Means Clustering to discover related metrics
After selecting an interesting spike in a metric, the algorithm can automate picking out a selection of other metrics which spiked at the same time. I can see that being pretty damn useful
metrics  k-means-clustering  clustering  algorithms  discovery  similarity  vividcortex  analysis  data 
march 2015 by jm
RIPQ: Advanced photo caching on flash for Facebook
Interesting priority-queue algorithm optimised for caching data on SSD
priority-queue  algorithms  facebook  ssd  flash  caching  ripq  papers 
february 2015 by jm
Proving that Android’s, Java’s and Python’s sorting algorithm is broken (and showing how to fix it)
Wow, this is excellent work. A formal verification of Tim Peters' TimSort failed, resulting in a bugfix:
While attempting to verify TimSort, we failed to establish its instance invariant. Analysing the reason, we discovered a bug in TimSort’s implementation leading to an ArrayOutOfBoundsException for certain inputs. We suggested a proper fix for the culprit method (without losing measurable performance) and we have formally proven that the fix actually is correct and that this bug no longer persists.
timsort  algorithms  android  java  python  sorting  formal-methods  proofs  openjdk 
february 2015 by jm
Automating Tinder with Eigenfaces
While my friends were getting sucked into "swiping" all day on their phones with Tinder, I eventually got fed up and designed a piece of software that automates everything on Tinder.


This is awesome. (via waxy)
via:waxy  tinder  eigenfaces  machine-learning  k-nearest-neighbour  algorithms  automation  ai 
february 2015 by jm
Are you better off running your big-data batch system off your laptop?
Heh, nice trolling.
Here are two helpful guidelines (for largely disjoint populations):

If you are going to use a big data system for yourself, see if it is faster than your laptop.
If you are going to build a big data system for others, see that it is faster than my laptop. [...]

We think everyone should have to do this, because it leads to better systems and better research.
graph  coding  hadoop  spark  giraph  graph-processing  hardware  scalability  big-data  batch  algorithms  pagerank 
january 2015 by jm
« earlier      
per page:    204080120160

related tags

3d  accountability  adaboost  adaptive-sampling  adversarial-classification  adwords  aggregation  aho-corasick  ai  algorithm  algorithms  alias  amazon  analysis  analytics  android  animated-gifs  animation  anomalies  anomaly-detection  anonymization  antispam  apache  aphorisms  apple  approximate  approximation  april-fools-day  apriori  architecture  arithmetic  arithmetic-coding  arrays  art  articles  atc  atomic  automation  availability  awful  azul  b+trees  b-trees  babel  backoff  backup  bacteria  bam  bandwidth  batch  ben-maurer  benchmarks  bernoulli  bias  big-data  big-o  bignums  bike-sharing  bill-davidow  binary  binary-search  binary-tree  bioinformatics  bit-twiddling  bitcoin  bitmap-indexes  bitmaps  bits  bitset  bitsets  bkd-trees  blade  blake  blake2  blast  blockchain  blocking-queues  bloom-filter  bloom-filters  book  books  borges  borisbikes  bots  boundary  bowtie  bricolage  brotli  bst  btrfs  bucketing  bufferbloat  bugs  burrows-wheeler  burrows-wheeler-transform  bw-trees  bwt  bytebuffer  c  c++  c-64  c3  c4.5  cache-eviction  cache-friendly  caches  caching  caffeine  cap  cardinality  cart  cas  cassandra  cbf  celestine-omin  cep  change-management  charts  chatroulette  cheat-sheet  chloropleth  choco  circuit-breakers  clever  cli  cliff-click  clint-fulkerson  clocks  clojure  cloudera  clustering  cmu  codel  coding  colmmacc  column-oriented  columnar-stores  combinatorial  comics  commandline  commentary  compaction  comparison  compas  complexity  compressed-bitmaps  compression  computation  computer-says-no  computer-science  computers  computervision  concurrency  confidence-sort  configuration  congestion  congestion-control  consensus  consensus-algorithms  consistency  consistent-hashing  conspiracy-theories  constraint-programming  consumer  content-spinning  continuous  control  conversant  conways-life  cool  copying  copyright  count-min  counters  counting  counting-sort  cpu  crapjects  crazy  crdt  crdts  crf++  crime  crispr  crypto  cs  csail  cubism  cuckoo-filters  cuckoo-hashing  culture  currency  cycling  d-left-hashing  daemons  damping  daniel-lemire  data  data-locality  data-mining  data-processing  data-protection  data-structures  database  databases  datamining  datastructures  dataviz  decay  decision-trees  deflate  delayqueue  deliveroo  democracy  demos  demoscene  dependencies  deployment  derp  dfa  dice  diff  differential-dataflow  differential-privacy  digests  dirichlet  discovery  discrimination  discussion  disruptor  distcomp  distributed  distributed-computing  distributed-systems  distsys  dithering  djb  dmitry-vyukov  dna  druid.io  dublinbikes  duplicate-detection  duplicates  dynamic-programming  dystopia  earth  edit-distance  eigenfaces  elasticsearch  elias-gamma-coding  els  em  email  emergent  emergent-behaviour  encoding  equifax  erasure-coding  eric-moritz  eric-veach  error  estimation  etailers  ethics  eu  eurion  euro  europe  eve  eve-online  events  eventual-consistency  executor  experian  exponential-decay  eye-candy  facebook  faceception  facial-recognition  fail  failure  fair-mapping  fairness  false-positives  fascism  fastbit  fault-tolerance  feature-extraction  feedback  feistel-network  ffffound  filesystems  filtering  finite-state-entropy-coding  flajolet  flapping  flash  fleet  floating-point  floyd-steinberg  flp  fm-index  forecast  forecast.io  formal-methods  frame-of-reference  fred-benenson  frequency  frequency-tables  frogsort  fs  fse  fsm  full-cycle  full-text-search  fun  funny  furore  future  fuzzy-matching  fuzzy-search  gallery  games  gaming  garbage-collection  gc  gcs  gdpr  geek  gems  gene-editing  genes  genetics  genitalia  genome  genomics  geodata  geometry  gif  giraph  gis  github  gliders  go  google  google-news  government  gpus  graph  graph-processing  graphchi  graphics  graphs  greenwald-khanna  guava  gzip  ha  hacker-news  hacking  hacks  hadoop  half-life  hamming  hamming-distance  hardware  hash  hashed-wheel-timer  hashed-wheel-timers  hashes  hashing  hashtables  hbos  hdfs  heap  heaps  heuristics  hft  hiring  histogram  histograms  history  hlc  hll  hlld  hmm  hoeffding-tree  horizon-charts  hot-spots  hrw  huffman  hyper-log-log  hyperbitbit  hyperloglog  hyperloglogplus  hyperlogsandwich  iblt  ilya-grigorik  image  images  immigration  indexes  indexing  industrial  influxdb  infoq  injector  insurance  integers  intersection  intersections  interviews  investing  ip  iphone  isis  jaq  jaro-winkler  java  java-8  java8  javaewah  javascript  jdk  jeff-erickson  jenks-natural-breaks  jiq  john-ousterhout  join-idle-queue  jokes  js  jvm  k-means  k-means-clustering  k-nearest-neighbour  k-shortest-paths  kafka  key-value-stores  knn  labelling  labor  labour  latency  law  law-enforcement  laws  lcg  lcs  leader-election  learning  lectures  leveldb  levenshtein  levenshtein-distance  lfsr  licensing  life  linear-counting  linear-regression  linkedin  linux  lists  lmax  load-balancers  load-balancing  lock-free  locking  logical-clocks  loglog  loglog-counting  logs  lookup3  looping  lossy  lru  lsm-trees  lua  lucene  mac  machine-learning  mailinator  majority  management  map-reduce  mapping  mapreduce  maps  mark-and-sweep  martin-kleppman  materiality  math  maths  mathwashing  mazes  mean  mechanical-sympathy  media  median  memcached  memory  memory-layout  merge-sort  merging  metamarkets  metrics  microservices  microsoft  mike-bostock  minhash  mit  ml  mltshp  modulo  money  mongodb  multicore  multisets  murmurhash  music  mutexes  naive-bayes  named-entities  nbta  near-neighbour-search  netflix  netty  networking  neural-networks  news  newsfeed  nitsanw  nlp  non-blocking  northpointe  nosql  nsfw  numbers  numeric  nytimes  o(1)  object-pools  object-spam  obscurity  ocr  one-pass  online  open-source  opencv  openjdk  openmp  ops  optimisation  optimization  ordering  outliers  pagerank  palantir  paper  papers  parallel  parsing  partitioning  patches  patents  patterns  paxos  paypal  pdf  percentiles  performance  permutation  peter-bailis  peter-norvig  petri-dish  phasers  pheme  photoshop  pid-controller  piece-work  pixar  poisson  policing  politics  popularity  populations  presentation  presentations  prime-numbers  primitives  print-on-demand  priority-queue  privacy  probabilistic  probability  programming  proofs  protest  proxies  puzzles  pymovie  python  q-digest  qcon  quadtree  quantiles  querying  questions  queueing  queues  quicksilver  quines  quora  quotes  rabin-karp  racism  radix-sort  raft  rahman-algorithm  raid  random  rank  ranking  rape  rappor  rating  read-alignment  reading  real-world  rebalancing  recipes  recommendations  recommenders  recordinality  redis  redlock  reduce  redundancy  reed-solomon  refcounting  reference  regex  regexps  regular-expressions  reliability  remycc  renderman  rendezvous-hashing  repair  replication  research  reservoir-sampling  reverse-engineering  reversing  riak  right-to-explanation  rights  ring-buffers  ripq  rivers  rle  roaring-bitmaps  rob-pike  robin-hood-hashing  rocksdb  round-robin  routing  rtt  ruby  rule-discovery  rules  rust  sampling  samtools  scala  scalability  scaling  scandals  scheduling  scoring  search  searching  sec  security  sentencing  sequencing  set  set-cover  set-membership  sets  sgd  sha1  sha3  sharding  shingling  shuffle  shuffling  silicon-valley  simd  similarity  siphash  sketch  sketches  sketching  skew  skiplists  slashdot  slides  smbc  snapshots  snippets  soa  social-media  society  software  solid-gold-bomb  solving  sort  sort-based-shuffle  sorting  sorting-networks  sought  sound  source-code-analysis  space  space-efficient  space-saving  spam  spamassassin  spark  sparrow  sparse-graphs  spatial  speed  spinedbuffer  spotify  spsc  sql  sql-server  sr-iov  ssd  sstables  stack-overflow  static-code-analysis  statistics  steven-skiena  stock  stock-market  storage  stream-processing  stream-summary  streaming  streamlib  streams  string-matching  string-search  strings  succinct-encoding  sudoku  supervised  surveillance  svm  swiss  symbol-alphabets  synchronization  sysadmin  systems  t-digest  t-shirts  taylorism  tcp  tech  technology  temperature  tesla  tests  text  text-analysis  text-matching  tflops  threading  threadpool  threads  throughput  time  time-machine  time-series  timer-wheels  timers  timing-wheels  timsort  tinder  tinylfu  tips  tokenization  tokumx  top-k  toread  trading  training  transactions  transparency  trees  trial-by-algorithm  tries  truetime  truth  tsd  tsm-trees  twitter  twoheadlines  typography  uber  ui  uiuc  uk  union  unlabelled-learning  unsupervised  unsupervised-learning  uptime  us-politics  usenix  variability  variance  vector  vectors  version-control  via-dehora  via:adriancolyer  via:bradfitz  via:cliffc  via:codeslinger  via:craig  via:cscotta  via:daveb  via:eoinbrazil  via:fanf  via:hackernews  via:highscalability  via:igrigorik  via:jgc  via:jwz  via:jzawodny  via:kragen  via:marcbrooker  via:nelson  via:norman-maurer  via:paperswelove  via:patrickmcfadin  via:pentadact  via:pheezy  via:proggit  via:qwghlm  via:sergio-bossa  via:waxy  video  viral  visualisation  visualization  vividcortex  volatility  voldemort  wah  wait-free  war  waves  weather  webgl  wegman  weight  weighting  wikis  wilson-score-interval  wisconsin  word-frequency  work  x-stream  xkcd  youtube  zero-shot  zfs  zip  zookeeper  zopfli 

Copy this bookmark:



description:


tags: