jm + snappy   8

How Uber Engineering Evaluated JSON Encoding and Compression Algorithms to Put the Squeeze on Trip Data
Key conclusions:

Simply compressing JSON with zlib would yield a reasonable tradeoff in size and speed. The result would be just a little bigger, but execution was much faster than using BZ2 on JSON.
Going with IDL-based protocols, Thrift and Protocol Buffers compressed with zlib or Snappy would give us the best gain in size and/or speed.
compression  json  performance  python  serialization  protobuf  zlib  snappy  cbor  messagepack  thrift  bz2 
9 weeks ago by jm
Scylla compression benchmarks
ScyllaDB tested out LZ4, Snappy, DEFLATE, and ZStandard at several different levels on a decently real-world-ish workload. tl;dr:
Use compression. Unless you are using a really (but REALLY) fast hard drive, using the default compression settings will be even faster than disabling compression, and the space savings are huge.

When running a data warehouse where data is mostly being read and only rarely updated, consider using DEFLATE. It provides very good compression ratios while maintaining high decompression speeds; compression can be slower, but that might be unimportant for your workload.

If your workload is write-heavy but you really care about saving disk space, consider using ZStandard on level 1. It provides a good middle-ground between LZ4/Snappy and DEFLATE in terms of compression ratios and keeps compression speeds close to LZ4 and Snappy. Be careful however: if you often want to read cold data (from the SSTables on disk, not currently stored in memory, so for example data that was inserted a long time ago), the slower decompression might become a problem.
compression  scylladb  storage  deflate  zstd  zstandard  lz4  snappy  gzip  benchmarks  tests  performance 
october 2019 by jm
Fibonacci Hashing: The Optimization that the World Forgot (or: a Better Alternative to Integer Modulo)
Turns out I was wrong. This is a big one. And everyone should be using it. Hash tables should not be prime number sized and they should not use an integer modulo to map hashes into slots. Fibonacci hashing is just better. Yet somehow nobody is using it and lots of big hash tables (including all the big implementations of std::unordered_map) are much slower than they should be because they don’t use Fibonacci Hashing.


Apparently this is binary multiplicative hashing, and Google's brotli, webp, and Snappy libs all use a constant derived heuristically from a compression test corpus along the same lines (see comments).

(Via Michael Fogleman)
algorithms  hashing  hash  fibonacci  golden-ratio  coding  hacks  brotli  webp  snappy  hash-tables  hashmaps  load-distribution 
june 2018 by jm
airlift/aircompressor: A port of Snappy, LZO and LZ4 to Java
This library contains implementations of LZ4, Snappy, and LZO written in pure Java. They are typically 10-40% faster than the JNI wrapper for the native libraries.
lz4  lzo  lzop  snappy  java  libraries  airlift  compression  performance 
january 2018 by jm
How eBay’s Shopping Cart used compression techniques to solve network I/O bottlenecks
compressing data written to MongoDB using LZ4_HIGH --dropped oplog write rates from 150GB/hour to 11GB/hour. Snappy and Gzip didn't fare too well by comparison
lz4  compression  gzip  json  snappy  scaling  ebay  mongodb 
february 2017 by jm
Announcing Snappy Ubuntu
Awesome! I was completely unaware this was coming down the pipeline.
A new, transactionally updated Ubuntu for the cloud. Ubuntu Core is a new rendition of Ubuntu for the cloud with transactional updates. Ubuntu Core is a minimal server image with the same libraries as today’s Ubuntu, but applications are provided through a simpler mechanism. The snappy approach is faster, more reliable, and lets us provide stronger security guarantees for apps and users — that’s why we call them “snappy” applications.

Snappy apps and Ubuntu Core itself can be upgraded atomically and rolled back if needed — a bulletproof approach to systems management that is perfect for container deployments. It’s called “transactional” or “image-based” systems management, and we’re delighted to make it available on every Ubuntu certified cloud.
ubuntu  linux  packaging  snappy  ubuntu-core  transactional-updates  apt  docker  ops 
december 2014 by jm
Compression in Kafka: GZIP or Snappy ?
With Ack: in this mode, as far as compression is concerned, the data gets compressed at the producer, decompressed and compressed on the broker before it sends the ack to the producer. The producer throughput with Snappy compression was roughly 22.3MB/s as compared to 8.9MB/s of the GZIP producer. Producer throughput is 150% higher with Snappy as compared to GZIP.

No ack, similar to Kafka 0.7 behavior: In this mode, the data gets compressed at the producer and it doesn’t wait for the ack from the broker. The producer throughput with Snappy compression was roughly 60.8MB/s as compared to 18.5MB/s of the GZIP producer. Producer throughput is 228% higher with Snappy as compared to GZIP. The higher compression savings in this test are due to the fact that the producer does not wait for the leader to re-compress and append the data; it simply compresses messages and fires away. Since Snappy has very high compression speed and low CPU usage, a single producer is able to compress the same amount of messages much faster as compared to GZIP.
gzip  snappy  compression  kafka  streaming  ops 
april 2013 by jm
snappy - A fast compressor/decompressor
'On a single core of a Core i7 processorin 64-bit mode, it compresses at about 250 MB/sec or more and decompresses atabout 500 MB/sec or more. (These numbers are for the slowest inputs in ourbenchmark suite; others are much faster.) In our tests, Snappy usuallyis faster than algorithms in the same class (e.g. LZO, LZF, FastLZ, QuickLZ,etc.) while achieving comparable compression ratios.'  Apache-licensed, from Google
snappy  google  compression  speed  from delicious
march 2011 by jm

Copy this bookmark:



description:


tags: