a library to compress and uncompress arrays of integers very fast. The assumption is that most (but not all) values in your array use much less than 32 bits, or that the gaps between the integers use much less than 32 bits. These sort of arrays often come up when using differential coding in databases and information retrieval (e.g., in inverted indexes or column stores).

Please note that random integers are not compressible, by this library or by any other means. If you ever had the means of systematically compressing random integers, you could compress any data source to nothing, by recursive application of your technique.

This library can decompress integers at a rate of over 1.2 billions per second (4.5 GB/s). It is significantly faster than generic codecs (such as Snappy, LZ4 and so on) when compressing arrays of integers.

The library is used in LinkedIn Pinot, a realtime distributed OLAP datastore. Part of this library has been integrated in Parquet (http://parquet.io/). A modified version of the library is included in the search engine Terrier (http://terrier.org/). This libary is used by ClueWeb Tools (https://github.com/lintool/clueweb). It is also used by Apache NiFi.

11 days ago by jm

Copy this bookmark: