Memory Layouts for Binary Search

17 days ago by mpm

The Eytzinger layout offers the best all-around performance over a wide range of array lengths

algorithm
memory
performance
17 days ago by mpm

Algorithms

january 2019 by mpm

This web page contains a free electronic version of my (soon to be) self-published textbook Algorithms, along with other lecture notes I have written for various theoretical computer science classes at the University of Illinois, Urbana-Champaign since 1998.

book
algorithm
january 2019 by mpm

Introduction to Locality-Sensitive Hashing

october 2018 by mpm

Locality-sensitive hashing (LSH) is a set of techniques that dramatically speed up search-for-neighbors or near-duplication detection on data. These techniques can be used, for example, to filter out duplicates of scraped web pages at an impressive speed, or to perform near-constant-time lookups of nearby points from a geospatial data set.

algorithm
october 2018 by mpm

FreeBSD lockless algorithm - seq

september 2018 by mpm

There are multiple places in the kernel where we need to read some variables very often, but there is a small number of cases when we write to them. For such cases, the seq interface was created.

non-blocking
algorithm
september 2018 by mpm

basecs

october 2017 by mpm

Exploring the basics of computer science, every Monday, for a year.

datastructure
algorithm
october 2017 by mpm

gperf

may 2017 by mpm

GNU gperf is a perfect hash function generator. For a given list of strings, it produces a hash function and hash table, in form of C or C++ code, for looking up a value depending on the input string. The hash function is perfect, which means that the hash table has no collisions, and the hash table lookup needs a single string comparison only.

algorithm
may 2017 by mpm

Robin Hood hashing: backward shift deletion

march 2017 by mpm

In this article, I am presenting the backward shift deletion variant for the Robin Hood hashing algorithm

algorithm
march 2017 by mpm

Probabilistic Data Structure Showdown: Cuckoo Filters vs. Bloom Filters

december 2016 by mpm

This post provides an update by exploring Cuckoo filters, a new probabilistic data structure that improves upon the standard Bloom filter

algorithm
bloomfilter
probability
december 2016 by mpm

Linear-log Bucketing: Fast, Versatile, Simple

june 2015 by mpm

What do memory allocation, histograms, and event scheduling have in common? They all benefit from rounding values to predetermined buckets, and the same bucketing strategy combines acceptable precision with reasonable space usage for a wide range of values

memory
algorithm
june 2015 by mpm

Mean Shift Clustering Overview

may 2015 by mpm

I really like mean shift because of its simplicity. The entire end result is controlled by one parameter—the kernel bandwidth value. Other clustering approaches, such as k-means, require a number of clusters to be specified as an input. This is acceptable for certain scenarios, but most of the time the number of clusters is not known

algorithm
may 2015 by mpm

A Fast, Minimal Memory, Consistent Hash Algorithm

march 2015 by mpm

We present jump consistent hash, a fast, minimal memory, consistent hash algorithm that can be expressed in about 5 lines of code. In comparison to the algorithm of Karger et al., jump consistent hash requires no storage, is faster, and does a better job of evenly dividing the key space among the buckets and of evenly dividing the workload when the number of buckets changes. Its main limitation is that the buckets must be numbered sequentially, which makes it more suitable for data storage applications than for distributed web caching.

dht
algorithm
march 2015 by mpm

streaming-papers

november 2014 by mpm

A curated collection of papers on streaming algorithms

algorithm
stream
statistics
datastructure
november 2014 by mpm

Algorithmic Complexity

august 2014 by mpm

If you a software developer, you know how difficult it can be studying for finals in school, technical interviews, or just refreshing yourself on fundamental algorithms and data structures.

AlgorithmicComplexity.com aims to be a good resource for anyone in this position.

algorithm
datastructure
AlgorithmicComplexity.com aims to be a good resource for anyone in this position.

august 2014 by mpm

A Fast, Minimal Memory, Consistent Hash Algorithm

july 2014 by mpm

We present jump consistent hash, a fast, minimal memory, consistent hash algorithm that can be expressed in about 5 lines of code. In comparison to the algorithm of Karger et al., jump consistent hash requires no storage, is faster, and does a better job of evenly dividing the key space among the buckets and of evenly dividing the workload when the number of buckets changes. Its main limitation is that the buckets must be numbered sequentially, which makes it more suitable for data storage applications than for distributed web caching

overlay
algorithm
july 2014 by mpm

farmhash

april 2014 by mpm

FarmHash is a successor to CityHash, and includes many of the same tricks and techniques, several of them taken from Austin Appleby’s MurmurHash.

We want FarmHash to be fast and easy for developers to use in datacenters, but in phones, tablets, and PCs too. So, yes, we’ve improved on CityHash64 and CityHash32 and so on. But we’re also catering to the case where you simply want a fast, robust hash function for hash tables, and it need not be the same on every platform. To that end, we provide sample code that has one interface harboring multiple platform-specific implementations

algorithm
performance
We want FarmHash to be fast and easy for developers to use in datacenters, but in phones, tablets, and PCs too. So, yes, we’ve improved on CityHash64 and CityHash32 and so on. But we’re also catering to the case where you simply want a fast, robust hash function for hash tables, and it need not be the same on every platform. To that end, we provide sample code that has one interface harboring multiple platform-specific implementations

april 2014 by mpm

The Median-of-Medians Algorithm

january 2014 by mpm

In this post, we consider the problem of selecting the i-th smallest element from an unsorted list of n elements. Somewhat surprisingly, there is an algorithm that solves this problem in linear time

algorithm
statistics
january 2014 by mpm

Simple Random Number Generation

december 2013 by mpm

This article will describe SimpleRNG, a very simple random number generator. The generator uses a well-tested algorithm and is quite efficient. Because it is so simple, it is easy to drop into projects and easy to debug into

algorithm
statistics
december 2013 by mpm

IP datagram reassembly algorithms

april 2013 by mpm

In fact, the process of reassembly is extremely simple. This document describes a way of dealing with reassembly which reduces the bookkeeping problem to a minimum, which requires for storage only one buffer equal in size to the final datagram being reassembled, which can reassemble a datagram from any number of fragments arriving in any order with any possible pattern of overlap and duplication, and which is appropriate for almost any sort of operating system

networking
algorithm
april 2013 by mpm

Algorithm Design for Performance Aware VM Consolidation

march 2013 by mpm

We present a performance preserving VM consolidation system that selectively places those VMs together that are likely to have the least contention

algorithm
march 2013 by mpm

xxhash

january 2013 by mpm

xxHash is an extremely fast Hash algorithm, working at speeds close to RAM limits

algorithm
january 2013 by mpm

simhash

october 2012 by mpm

Hashing algorithm to find near-duplicates in binary data

erlang
algorithm
october 2012 by mpm

Dynamic Programming versus Memoization

september 2012 by mpm

The difference between memoization and dynamic programming

algorithm
september 2012 by mpm

Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm

september 2012 by mpm

This extended abstract describes and analyses a near-optimal probabilistic algorithm, HYPERLOGLOG, dedicated to estimating the number of distinct elements (the cardinality) of very large data ensembles

algorithm
probability
september 2012 by mpm

Homomorphic Hashing

august 2012 by mpm

A homomorphic hash is a construction that's simple in principle: a hash function such that you can compute the hash of a composite block from the hashes of the individual blocks.

algorithm
august 2012 by mpm

How to count a billion distinct objects using only 1.5KB of Memory

august 2012 by mpm

You can estimate the cardinality of a set with cardinality Nmax using just loglog(Nmax) + O(1) bits

algorithm
memory
august 2012 by mpm

Multi Ring Hashing

july 2012 by mpm

Multiple hash rings. Instead of each node having several tokens in one ring, it has one token in multiple rings

topology
algorithm
july 2012 by mpm

Multi-armed bandit problem

may 2012 by mpm

The multi-armed bandit problem takes its terminology from a casino. You are faced with a wall of slot machines, each with its own lever. You suspect that some slot machines pay out more frequently than others. How can you learn which machine is the best, and get the most coins in the fewest trials?

algorithm
probability
may 2012 by mpm

Log structured storage

january 2012 by mpm

It originated as Log Structured File Systems in the 1980s, but more recently it's seeing increasing use as a way to structure storage in database engines

algorithm
storage
filesystem
january 2012 by mpm

Fountain Codes

january 2012 by mpm

A fountain code is a way to take some data - a file, for example - and transform it into an effectively unlimited number of encoded chunks, such that you can reassemble the original file given any subset of those chunks, as long as you have a little more than the size of the original file. In other words, it lets you create a 'fountain' of encoded data; a receiver can reassemble the file by catching enough 'droplets', regardless of which ones they get and which ones they miss.

algorithm
january 2012 by mpm

Darts, Dice, and Coins

december 2011 by mpm

You are given an n-sided die where side i has probability p i of being rolled. What is the most efficient data structure for simulating rolls of the die?

probability
algorithm
datastructure
december 2011 by mpm

murmurhash-java

november 2011 by mpm

This is an implementation by Viliam Holub of the fast non-cryptographic murmur hash algorithm.

java
algorithm
november 2011 by mpm

Simultaneous Estimation of Several Persentiles

november 2011 by mpm

The simultaneous estimation of several percentiles is a cumbersome task. The extended P2 -algorithm proposed in this study significantly reduces the computation of the estimation. The algorithm simultaneously estimates several percentiles without storing and sorting the observations

statistics
algorithm
november 2011 by mpm

The P2 Algorithm for Dynamic Calculation of Quantiles and Histograms Without Storing Observations

november 2011 by mpm

The estimates are produced dynamically as the observations are generated. The observations are not stored; therefore, the algorithm has a very small and fixed storage requirement regardless of the number of observations

statistics
algorithm
november 2011 by mpm

Archive of Interesting Code

november 2011 by mpm

The archive of interesting code is an (ambitious) effort on my part to research, intuit, and code up every interesting algorithm and data structure ever invented

algorithm
datastructure
c++
november 2011 by mpm

Maged Michael - Selected Publications

october 2011 by mpm

Lock-Free/Wait-Free data structures and algorithms

concurrency
datastructure
algorithm
october 2011 by mpm

K-tree

july 2011 by mpm

K-tree is a tree structured clustering algorithm. It is also refered to as a Tree Structured Vector Quantizer (TSVQ). The goal of cluster analysis is to group objects based on similarity. Each object in a K-tree is represented by an n-dimensional vector.

algorithm
datastructure
july 2011 by mpm

A Linear-Time Majority Vote Algorithm

july 2011 by mpm

This algorithm, which Bob Boyer and I invented in 1980 decides which element of a sequence is in the majority, provided there is such an element

algorithm
july 2011 by mpm

Data Structure Visualization

may 2011 by mpm

Interactive animations for a variety of data structures and algorithms

algorithm
visualization
datastructure
may 2011 by mpm

Clever Algorithms: Nature-Inspired Programming Recipes

april 2011 by mpm

The book "Clever Algorithms: Nature-Inspired Programming Recipes" by Jason Brownlee PhD describes 45 algorithms from the field of Artificial Intelligence. All algorithm descriptions are complete and consistent to ensure that they are accessible, usable and understandable by a wide audience

algorithm
book
april 2011 by mpm

Very fast and scalable topic routing

march 2011 by mpm

a few approaches to topic routing optimization

datastructure
algorithm
march 2011 by mpm

Fractional cascading

february 2011 by mpm

fractional cascading is a technique to speed up a sequence of binary searches for the same value in a sequence of related data structures

algorithm
february 2011 by mpm

Finding similar items using minhashing

february 2011 by mpm

Minhashing is a technique for generating signatures for items. The more similar two items are, the higher the chance that they share the same minhash

algorithm
february 2011 by mpm

murmurhash

october 2010 by mpm

This webpage is the new home for the MurmurHash family of hash functions along with the SMHasher test suite used to verify them

algorithm
october 2010 by mpm

CMPH

october 2010 by mpm

The CMPH Library encapsulates the newest and more efficient algorithms in an easy-to-use, production-quality, fast API. The library was designed to work with big entries that cannot fit in the main memory. It has been used successfully for constructing minimal perfect hash functions for sets with more than 100 million of keys, and we intend to expand this number to the order of billion of keys

algorithm
october 2010 by mpm

the median of a trillion numbers

september 2010 by mpm

Distributed median finding

distributed
algorithm
erlang
september 2010 by mpm

Introduction to parallel & distributed algorithms

august 2010 by mpm

This was written as a unit for an introductory algorithms course. It's material that often doesn't appear in textbooks for such courses, which is a pity because distributed algorithms is an important topic in today's world.

distributed
concurrency
algorithm
august 2010 by mpm

A short note on random load balancing

june 2010 by mpm

Let’s say you have a problem of allocating balls into bins randomly, but you want the number of balls in bins to be distributed as evenly as possible. How would you do that?

algorithm
june 2010 by mpm

Understanding and Applying Operational Transformation

may 2010 by mpm

The key to Wave is the mechanism by which we interact with these documents: operational transformation. Wave actually doesn’t allow you to get access to a document as raw XML or anything even approaching it. Instead, it demands that all of your access to the document be performed in terms of operations. This has two consequences: first, it allows for some really incredible collaborative tools like the Wave client; second, it makes it really tricky to implement any sort of Wave-compatible service

algorithm
coordination
may 2010 by mpm

Dynamic Programming Practice Problems

april 2010 by mpm

a collection of practice dynamic programming problems and their solutions.

algorithm
april 2010 by mpm

Introduction to lock-free/wait-free and the ABA problem

december 2009 by mpm

general information on lock-free/wait-free programming

concurrency
algorithm
december 2009 by mpm

Simple Simhashing

november 2009 by mpm

Suppose you have a huge number of items that you would like to group together by a fuzzy notion of similarity. Suppose the only tool available to you is a key-value store. Suppose you only have the resources to consider each object once

algorithm
november 2009 by mpm

Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms

december 2008 by mpm

non-blocking concurrent queue algorithm and a new two-lock queue algorithm in which one enqueue and one dequeue can proceed concurrently

algorithm
concurrency
december 2008 by mpm

Highly Scalable Java

november 2008 by mpm

A collection of Concurrent and Highly Scalable Utilities. These are intended as direct replacements for the java.util.* or java.util.concurrent.* collections but with better performance when many CPUs are using the collection concurrently.

java
concurrency
algorithm
november 2008 by mpm

A Hash Function for Hash Table Lookup

october 2008 by mpm

I offer you a new hash function for hash table lookup that is faster and more thorough than the one you are using now

algorithm
october 2008 by mpm

Merging sorted streams in Python

october 2008 by mpm

You have several sorted sequences (iterables) and need to iterate on the overall sorted sequence that results from “merging” these sequences.

python
algorithm
october 2008 by mpm

Distributed k-Ary System

april 2008 by mpm

Algorithms for Distributed Hash Tables

algorithm
distributed
dht
april 2008 by mpm

How to search for the word "pen1s" in 185 emails every second

january 2008 by mpm

How Mailinator searches every incoming email for a few hundred strings efficiently (in Java).

java
algorithm
search
january 2008 by mpm

Introduction to nonblocking algorithms

november 2007 by mpm

Java Atomic types

algorithm
concurrency
java
november 2007 by mpm

BK-Trees

october 2007 by mpm

a tree-based data structure engineered for quickly finding near-matches to a string

algorithm
datastructure
october 2007 by mpm

The Stony Brook Algorithm Repository

october 2007 by mpm

a comprehensive collection of algorithm implementations for over seventy of the most fundamental problems in combinatorial algorithms

datastructure
algorithm
october 2007 by mpm

Algorithms

october 2007 by mpm

This is a penultimate draft of our soon to appear textbook

book
algorithm
october 2007 by mpm

Dictionary of Algorithms and Data Structures

february 2007 by mpm

This is a dictionary of algorithms, algorithmic techniques, data structures, archetypal problems, and related definitions

datastructure
algorithm
february 2007 by mpm

**related tags**

Copy this bookmark: