jm + hashes   11

google/highwayhash: Fast strong hash functions: SipHash/HighwayHash
HighwayHash: 'We have devised a new way of mixing inputs with AVX2 multiply and permute instructions. The multiplications are 32x32 -> 64 bits and therefore infeasible to reverse. Permuting equalizes the distribution of the resulting bytes. The internal state occupies four 256-bit AVX2 registers. Due to limitations of the instruction set, the registers are partitioned into two 512-bit halves that remain independent until the reduce phase. The algorithm outputs 64 bit digests or up to 256 bits at no extra cost. In addition to high throughput, the algorithm is designed for low finalization cost. The result is more than twice as fast as SipTreeHash.

We also provide an SSE4.1 version (80% as fast for large inputs and 95% as fast for short inputs), an implementation for VSX on POWER and a portable version (10% as fast). A third-party ARM implementation is referenced below.

Statistical analyses and preliminary cryptanalysis are given in'

(via Tony Finch)
siphash  highwayhash  via:fanf  hashing  hashes  algorithms  mac  google  hash 
9 weeks ago by jm
Rendezvous hashing - Wikipedia, the free encyclopedia

Rendezvous or Highest Random Weight (HRW) hashing[1][2] is an algorithm that allows clients to achieve distributed agreement on a set of k options out of a possible set of n options. A typical application is when clients need to agree on which sites (or proxies) objects are to assigned to. When k is 1, it subsumes the goals of consistent hashing, using an entirely different method.
hrw  hashing  hashes  consistent-hashing  rendezvous-hashing  algorithms  discovery  distributed-computing 
april 2016 by jm
The general birthday problem
Good explanation and scipy code for the birthday paradox and hash collisions
hashing  hashes  collisions  birthday-problem  birthday-paradox  coding  probability  statistics 
february 2016 by jm
Birthday problem calculator
I keep having to google this, so here's a good one which works -- unlike Wolfram Alpha!
birthday  birthday-paradox  birthday-problem  hashes  hash-collision  attacks  security  collisions  calculators  probability  statistcs 
december 2015 by jm
Trend Micro Locality Sensitive Hash
a fuzzy matching library. Given a byte stream with a minimum length
of 512 bytes, TLSH generates a hash value which can be used for similarity
comparisons. Similar objects will have similar hash values which allows for
the detection of similar objects by comparing their hash values. Note that
the byte stream should have a sufficient amount of complexity. For example,
a byte stream of identical bytes will not generate a hash value.

Paper here:

via adulau
nilsimsa  sdhash  ssdeep  locality-sensitive  hashing  algorithm  hashes  trend-micro  tlsh  hash  fuzzy-matching  via:adulau 
may 2015 by jm
'Leak of the secret German Internet Censorship URL blacklist BPjM-Modul'.

Turns out there's a blocklist of adult-only or prohibited domains issued by a German government department, The Federal Department for Media Harmful to Young Persons (German: "Bundesprüfstelle für jugendgefährdende Medien" or BPjM), issued in the form of a list of hashes of those domains. These were extracted from an AVM router, then the hashes were brute forced using several other plaintext URL blocklists and domain lists.

Needless to say, there's an assortment of silly false positives, such as the listing of the website for the 1997 3D Realms game "Shadow Warrior":
hashes  reversing  reverse-engineering  germany  german  bpjm  filtering  blocklists  blacklists  avm  domains  censorship  fps 
july 2014 by jm
Hey Judy, don't make it bad
Github get good results using Judy arrays to replace a Ruby hash. However: the whole blog post is a bit dodgy to me. It feels like there are much better ways to fix the problem:

1. the big one: don't do GC-heavy activity in the front-end web servers. Split that language-classification code into a separate service. Write its results to a cache and don't re-query needlessly.
2. why isn't this benchmarked against a C/C++ hash? it's only 36000 entries, loaded once at startup. lookups against that should be blisteringly fast even with the basic data structures, and that would also be outside the Ruby heap so avoid the GC overhead. Feels like the use of a Judy array was a "because I want to" decision.
3. personally, I'd have preferred they spend time fixing their uptime problems....

See also for more kvetching.
ruby  github  gc  judy-arrays  linguist  hashes  data-structures 
may 2013 by jm
Dropbox dedupe feature allows materialization of any file, if you know its hash
'allows users to exploit Dropbox’s file hashing scheme to copy files into their account without actually having them. Dropship will save the hashes of a file in JSON format. Anyone can then take these hashes and load the original file into their Dropbox account using Dropship.' heh. that sounds very familiar, I seem to recall thinking about this problem on several occasions... ;) Dropbox certainly didn't like it, going by this account
security  filesharing  dropbox  online-backup  online-storage  p2p  hashes  sha  dmca 
april 2011 by jm
Stop using unsafe keyed hashes, use HMAC
why HMAC is more secure than secret-suffix and secret-prefix keyed hashing. good to know
hmac  security  crypto  hashing  md5  hashes  sha256  sha1  from delicious
october 2009 by jm

Copy this bookmark: