mpm + algorithm   71

Memory Layouts for Binary Search
The Eytzinger layout offers the best all-around performance over a wide range of array lengths
algorithm  memory  performance 
17 days ago by mpm
This web page contains a free electronic version of my (soon to be) self-published textbook Algorithms, along with other lecture notes I have written for various theoretical computer science classes at the University of Illinois, Urbana-Champaign since 1998.
book  algorithm 
january 2019 by mpm
Introduction to Locality-Sensitive Hashing
Locality-sensitive hashing (LSH) is a set of techniques that dramatically speed up search-for-neighbors or near-duplication detection on data. These techniques can be used, for example, to filter out duplicates of scraped web pages at an impressive speed, or to perform near-constant-time lookups of nearby points from a geospatial data set.
october 2018 by mpm
FreeBSD lockless algorithm - seq
There are multiple places in the kernel where we need to read some variables very often, but there is a small number of cases when we write to them. For such cases, the seq interface was created.
non-blocking  algorithm 
september 2018 by mpm
Algorithms in the style of IKEA instructions
fun  algorithm 
march 2018 by mpm
Exploring the basics of computer science, every Monday, for a year.
datastructure  algorithm 
october 2017 by mpm
GNU gperf is a perfect hash function generator. For a given list of strings, it produces a hash function and hash table, in form of C or C++ code, for looking up a value depending on the input string. The hash function is perfect, which means that the hash table has no collisions, and the hash table lookup needs a single string comparison only.
may 2017 by mpm
Robin Hood hashing: backward shift deletion
In this article, I am presenting the backward shift deletion variant for the Robin Hood hashing algorithm
march 2017 by mpm
Probabilistic Data Structure Showdown: Cuckoo Filters vs. Bloom Filters
This post provides an update by exploring Cuckoo filters, a new probabilistic data structure that improves upon the standard Bloom filter
algorithm  bloomfilter  probability 
december 2016 by mpm
Linear-log Bucketing: Fast, Versatile, Simple
What do memory allocation, histograms, and event scheduling have in common? They all benefit from rounding values to predetermined buckets, and the same bucketing strategy combines acceptable precision with reasonable space usage for a wide range of values
memory  algorithm 
june 2015 by mpm
Mean Shift Clustering Overview
I really like mean shift because of its simplicity. The entire end result is controlled by one parameter—the kernel bandwidth value. Other clustering approaches, such as k-means, require a number of clusters to be specified as an input. This is acceptable for certain scenarios, but most of the time the number of clusters is not known
may 2015 by mpm
Exceptionally fast and statistically robust hash functions
may 2015 by mpm
A Fast, Minimal Memory, Consistent Hash Algorithm
We present jump consistent hash, a fast, minimal memory, consistent hash algorithm that can be expressed in about 5 lines of code. In comparison to the algorithm of Karger et al., jump consistent hash requires no storage, is faster, and does a better job of evenly dividing the key space among the buckets and of evenly dividing the workload when the number of buckets changes. Its main limitation is that the buckets must be numbered sequentially, which makes it more suitable for data storage applications than for distributed web caching.
dht  algorithm 
march 2015 by mpm
A curated collection of papers on streaming algorithms
algorithm  stream  statistics  datastructure 
november 2014 by mpm
Algorithmic Complexity
If you a software developer, you know how difficult it can be studying for finals in school, technical interviews, or just refreshing yourself on fundamental algorithms and data structures. aims to be a good resource for anyone in this position.
algorithm  datastructure 
august 2014 by mpm
A Fast, Minimal Memory, Consistent Hash Algorithm
We present jump consistent hash, a fast, minimal memory, consistent hash algorithm that can be expressed in about 5 lines of code. In comparison to the algorithm of Karger et al., jump consistent hash requires no storage, is faster, and does a better job of evenly dividing the key space among the buckets and of evenly dividing the workload when the number of buckets changes. Its main limitation is that the buckets must be numbered sequentially, which makes it more suitable for data storage applications than for distributed web caching
overlay  algorithm 
july 2014 by mpm
FarmHash is a successor to CityHash, and includes many of the same tricks and techniques, several of them taken from Austin Appleby’s MurmurHash.

We want FarmHash to be fast and easy for developers to use in datacenters, but in phones, tablets, and PCs too. So, yes, we’ve improved on CityHash64 and CityHash32 and so on. But we’re also catering to the case where you simply want a fast, robust hash function for hash tables, and it need not be the same on every platform. To that end, we provide sample code that has one interface harboring multiple platform-specific implementations
algorithm  performance 
april 2014 by mpm
The Median-of-Medians Algorithm
In this post, we consider the problem of selecting the i-th smallest element from an unsorted list of n elements. Somewhat surprisingly, there is an algorithm that solves this problem in linear time
algorithm  statistics 
january 2014 by mpm
Simple Random Number Generation
This article will describe SimpleRNG, a very simple random number generator. The generator uses a well-tested algorithm and is quite efficient. Because it is so simple, it is easy to drop into projects and easy to debug into
algorithm  statistics 
december 2013 by mpm
IP datagram reassembly algorithms
In fact, the process of reassembly is extremely simple. This document describes a way of dealing with reassembly which reduces the bookkeeping problem to a minimum, which requires for storage only one buffer equal in size to the final datagram being reassembled, which can reassemble a datagram from any number of fragments arriving in any order with any possible pattern of overlap and duplication, and which is appropriate for almost any sort of operating system
networking  algorithm 
april 2013 by mpm
Algorithm Design for Performance Aware VM Consolidation
We present a performance preserving VM consolidation system that selectively places those VMs together that are likely to have the least contention
march 2013 by mpm
xxHash is an extremely fast Hash algorithm, working at speeds close to RAM limits
january 2013 by mpm
Hashing algorithm to find near-duplicates in binary data
erlang  algorithm 
october 2012 by mpm
Dynamic Programming versus Memoization
The difference between memoization and dynamic programming
september 2012 by mpm
Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm
This extended abstract describes and analyses a near-optimal probabilistic algorithm, HYPERLOGLOG, dedicated to estimating the number of distinct elements (the cardinality) of very large data ensembles
algorithm  probability 
september 2012 by mpm
Homomorphic Hashing
A homomorphic hash is a construction that's simple in principle: a hash function such that you can compute the hash of a composite block from the hashes of the individual blocks.
august 2012 by mpm
How to count a billion distinct objects using only 1.5KB of Memory
You can estimate the cardinality of a set with cardinality Nmax using just loglog(Nmax) + O(1) bits
algorithm  memory 
august 2012 by mpm
Multi Ring Hashing
Multiple hash rings. Instead of each node having several tokens in one ring, it has one token in multiple rings
topology  algorithm 
july 2012 by mpm
SipHash is a family of pseudorandom functions optimized for short inputs.

Target applications include network traffic authentication and hash-table lookups protected against hash-flooding denials-of-service attacks
june 2012 by mpm
Multi-armed bandit problem
The multi-armed bandit problem takes its terminology from a casino. You are faced with a wall of slot machines, each with its own lever. You suspect that some slot machines pay out more frequently than others. How can you learn which machine is the best, and get the most coins in the fewest trials?
algorithm  probability 
may 2012 by mpm
CityHash provides hash functions for strings
algorithm  performance 
january 2012 by mpm
Log structured storage
It originated as Log Structured File Systems in the 1980s, but more recently it's seeing increasing use as a way to structure storage in database engines
algorithm  storage  filesystem 
january 2012 by mpm
Fountain Codes
A fountain code is a way to take some data - a file, for example - and transform it into an effectively unlimited number of encoded chunks, such that you can reassemble the original file given any subset of those chunks, as long as you have a little more than the size of the original file. In other words, it lets you create a 'fountain' of encoded data; a receiver can reassemble the file by catching enough 'droplets', regardless of which ones they get and which ones they miss.
january 2012 by mpm
Darts, Dice, and Coins
You are given an n-sided die where side i has probability p i of being rolled. What is the most efficient data structure for simulating rolls of the die?
probability  algorithm  datastructure 
december 2011 by mpm
This is an implementation by Viliam Holub of the fast non-cryptographic murmur hash algorithm.
java  algorithm 
november 2011 by mpm
Simultaneous Estimation of Several Persentiles
The simultaneous estimation of several percentiles is a cumbersome task. The extended P2 -algorithm proposed in this study significantly reduces the computation of the estimation. The algorithm simultaneously estimates several percentiles without storing and sorting the observations
statistics  algorithm 
november 2011 by mpm
The P2 Algorithm for Dynamic Calculation of Quantiles and Histograms Without Storing Observations
The estimates are produced dynamically as the observations are generated. The observations are not stored; therefore, the algorithm has a very small and fixed storage requirement regardless of the number of observations
statistics  algorithm 
november 2011 by mpm
Archive of Interesting Code
The archive of interesting code is an (ambitious) effort on my part to research, intuit, and code up every interesting algorithm and data structure ever invented
algorithm  datastructure  c++ 
november 2011 by mpm
Maged Michael - Selected Publications
Lock-Free/Wait-Free data structures and algorithms
concurrency  datastructure  algorithm 
october 2011 by mpm
K-tree is a tree structured clustering algorithm. It is also refered to as a Tree Structured Vector Quantizer (TSVQ). The goal of cluster analysis is to group objects based on similarity. Each object in a K-tree is represented by an n-dimensional vector.
algorithm  datastructure 
july 2011 by mpm
A Linear-Time Majority Vote Algorithm
This algorithm, which Bob Boyer and I invented in 1980 decides which element of a sequence is in the majority, provided there is such an element
july 2011 by mpm
Data Structure Visualization
Interactive animations for a variety of data structures and algorithms
algorithm  visualization  datastructure 
may 2011 by mpm
Clever Algorithms: Nature-Inspired Programming Recipes
The book "Clever Algorithms: Nature-Inspired Programming Recipes" by Jason Brownlee PhD describes 45 algorithms from the field of Artificial Intelligence. All algorithm descriptions are complete and consistent to ensure that they are accessible, usable and understandable by a wide audience
algorithm  book 
april 2011 by mpm
Very fast and scalable topic routing
a few approaches to topic routing optimization
datastructure  algorithm 
march 2011 by mpm
Fractional cascading
fractional cascading is a technique to speed up a sequence of binary searches for the same value in a sequence of related data structures
february 2011 by mpm
Finding similar items using minhashing
Minhashing is a technique for generating signatures for items. The more similar two items are, the higher the chance that they share the same minhash
february 2011 by mpm
This webpage is the new home for the MurmurHash family of hash functions along with the SMHasher test suite used to verify them
october 2010 by mpm
The CMPH Library encapsulates the newest and more efficient algorithms in an easy-to-use, production-quality, fast API. The library was designed to work with big entries that cannot fit in the main memory. It has been used successfully for constructing minimal perfect hash functions for sets with more than 100 million of keys, and we intend to expand this number to the order of billion of keys
october 2010 by mpm
Introduction to parallel & distributed algorithms
This was written as a unit for an introductory algorithms course. It's material that often doesn't appear in textbooks for such courses, which is a pity because distributed algorithms is an important topic in today's world.
distributed  concurrency  algorithm 
august 2010 by mpm
A short note on random load balancing
Let’s say you have a problem of allocating balls into bins randomly, but you want the number of balls in bins to be distributed as evenly as possible. How would you do that?
june 2010 by mpm
Understanding and Applying Operational Transformation
The key to Wave is the mechanism by which we interact with these documents: operational transformation. Wave actually doesn’t allow you to get access to a document as raw XML or anything even approaching it. Instead, it demands that all of your access to the document be performed in terms of operations. This has two consequences: first, it allows for some really incredible collaborative tools like the Wave client; second, it makes it really tricky to implement any sort of Wave-compatible service
algorithm  coordination 
may 2010 by mpm
Dynamic Programming Practice Problems
a collection of practice dynamic programming problems and their solutions.
april 2010 by mpm
Introduction to lock-free/wait-free and the ABA problem
general information on lock-free/wait-free programming
concurrency  algorithm 
december 2009 by mpm
Simple Simhashing
Suppose you have a huge number of items that you would like to group together by a fuzzy notion of similarity. Suppose the only tool available to you is a key-value store. Suppose you only have the resources to consider each object once
november 2009 by mpm
Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms
non-blocking concurrent queue algorithm and a new two-lock queue algorithm in which one enqueue and one dequeue can proceed concurrently
algorithm  concurrency 
december 2008 by mpm
Highly Scalable Java
A collection of Concurrent and Highly Scalable Utilities. These are intended as direct replacements for the java.util.* or java.util.concurrent.* collections but with better performance when many CPUs are using the collection concurrently.
java  concurrency  algorithm 
november 2008 by mpm
A Hash Function for Hash Table Lookup
I offer you a new hash function for hash table lookup that is faster and more thorough than the one you are using now
october 2008 by mpm
Merging sorted streams in Python
You have several sorted sequences (iterables) and need to iterate on the overall sorted sequence that results from “merging” these sequences.
python  algorithm 
october 2008 by mpm
Distributed k-Ary System
Algorithms for Distributed Hash Tables
algorithm  distributed  dht 
april 2008 by mpm
How to search for the word "pen1s" in 185 emails every second
How Mailinator searches every incoming email for a few hundred strings efficiently (in Java).
java  algorithm  search 
january 2008 by mpm
a tree-based data structure engineered for quickly finding near-matches to a string
algorithm  datastructure 
october 2007 by mpm
The Stony Brook Algorithm Repository
a comprehensive collection of algorithm implementations for over seventy of the most fundamental problems in combinatorial algorithms
datastructure  algorithm 
october 2007 by mpm
This is a penultimate draft of our soon to appear textbook
book  algorithm 
october 2007 by mpm
Dictionary of Algorithms and Data Structures
This is a dictionary of algorithms, algorithmic techniques, data structures, archetypal problems, and related definitions
datastructure  algorithm 
february 2007 by mpm

Copy this bookmark: