jm + hashes   14

Historic S3 data corruption due to a fault load balancer
This came up in a discussion of using hashes for end-to-end data resiliency on the og-aws slack. Turns out AWS support staff wrote it up at the time:
We've isolated this issue to a single load balancer that was brought into service at 10:55pm PDT on Friday, 6/20 [2008].  It was taken out of service at 11am PDT Sunday, 6/22.  While it was in service it handled a small fraction of Amazon S3's total requests in the US.  Intermittently, under load, it was corrupting single bytes in the byte stream.  When the requests reached Amazon S3, if the Content-MD5 header was specified, Amazon S3 returned an error indicating the object did not match the MD5 supplied.  When no MD5 is specified, we are unable to determine if transmission errors occurred, and Amazon S3 must assume that the object has been correctly transmitted. Based on our investigation with both internal and external customers, the small amount of traffic received by this particular load balancer, and the intermittent nature of the above issue on this one load balancer, this appears to have impacted a very small portion of PUTs during this time frame.

One of the things we'll do is improve our logging of requests with MD5s, so that we can look for anomalies in their 400 error rates.  Doing this will allow us to provide more proactive notification on potential transmission issues in the future, for customers who use MD5s and those who do not. In addition to taking the actions noted above, we encourage all of our customers to take advantage of mechanisms designed to protect their applications from incorrect data transmission.  For all PUT requests, Amazon S3 computes its own MD5, stores it with the object, and then returns the computed MD5 as part of the PUT response code in the ETag.  By validating the ETag returned in the response, customers can verify that Amazon S3 received the correct bytes even if the Content MD5 header wasn't specified in the PUT request.  Because network transmission errors can occur at any point between the customer and Amazon S3, we recommend that all customers use the Content-MD5 header and/or validate the ETag returned on a PUT request to ensure that the object was correctly transmitted.  This is a best practice that we'll emphasize more heavily in our documentation to help customers build applications that can handle this situation.
aws  s3  outages  postmortems  load-balancing  data-corruption  corruption  failure  md5  hashing  hashes 
4 days ago by jm
SHA-1 is a Shambles - First Chosen-Prefix Collision on SHA-1 and Application to the PGP Web of Trust

Abstract: The SHA-1 hash function was designed in 1995 and has been widely used during two decades. A theoretical collision attack was first proposed in 2004 [WYY05], but due to its high complexity it was only implemented in practice in 2017, using a large GPU cluster [SBK+17]. More recently, an almost practical chosen-prefix collision attack against SHA-1 has been proposed [LP19]. This more powerful attack allows to build colliding messages with two arbitrary prefixes, which is much more threatening for real protocols.

In this paper, we report the first practical implementation of this attack, and its impact on real-world security with a PGP/GnuPG impersonation attack. We managed to significantly reduce the complexity of collisions attack against SHA-1: on an Nvidia GTX 970, identical-prefix collisions can now be computed with a complexity of 261.2261.2 rather than 264.7264.7, and chosen-prefix collisions with a complexity of 263.4263.4 rather than 267.1267.1. When renting cheap GPUs, this translates to a cost of 11k US\$ for a collision, and 45k US\$ for a chosen-prefix collision, within the means of academic researchers. Our actual attack required two months of computations using 900 Nvidia GTX 1060 GPUs (we paid 75k US\$ because GPU prices were higher, and we wasted some time preparing the attack).

Therefore, the same attacks that have been practical on MD5 since 2009 are now practical on SHA-1. In particular, chosen-prefix collisions can break signature schemes and handshake security in secure channel protocols (TLS, SSH). We strongly advise to remove SHA-1 from those type of applications as soon as possible. We exemplify our cryptanalysis by creating a pair of PGP/GnuPG keys with different identities, but colliding SHA-1 certificates. A SHA-1 certification of the first key can therefore be transferred to the second key, leading to a forgery. This proves that SHA-1 signatures now offers virtually no security in practice. The legacy branch of GnuPG still uses SHA-1 by default for identity certifications, but after notifying the authors, the modern branch now rejects SHA-1 signatures (the issue is tracked as CVE-2019-14855).


(Via Tony Finch)
via:fanf  security  sha  sha-1  crypto  hashes  hashing  pgp  gpg  collisions 
19 days ago by jm
google/highwayhash: Fast strong hash functions: SipHash/HighwayHash
HighwayHash: 'We have devised a new way of mixing inputs with AVX2 multiply and permute instructions. The multiplications are 32x32 -> 64 bits and therefore infeasible to reverse. Permuting equalizes the distribution of the resulting bytes. The internal state occupies four 256-bit AVX2 registers. Due to limitations of the instruction set, the registers are partitioned into two 512-bit halves that remain independent until the reduce phase. The algorithm outputs 64 bit digests or up to 256 bits at no extra cost. In addition to high throughput, the algorithm is designed for low finalization cost. The result is more than twice as fast as SipTreeHash.

We also provide an SSE4.1 version (80% as fast for large inputs and 95% as fast for short inputs), an implementation for VSX on POWER and a portable version (10% as fast). A third-party ARM implementation is referenced below.

Statistical analyses and preliminary cryptanalysis are given in https://arxiv.org/abs/1612.06257.'

(via Tony Finch)
siphash  highwayhash  via:fanf  hashing  hashes  algorithms  mac  google  hash 
january 2018 by jm
Rendezvous hashing - Wikipedia, the free encyclopedia

Rendezvous or Highest Random Weight (HRW) hashing[1][2] is an algorithm that allows clients to achieve distributed agreement on a set of k options out of a possible set of n options. A typical application is when clients need to agree on which sites (or proxies) objects are to assigned to. When k is 1, it subsumes the goals of consistent hashing, using an entirely different method.
hrw  hashing  hashes  consistent-hashing  rendezvous-hashing  algorithms  discovery  distributed-computing 
april 2016 by jm
The general birthday problem
Good explanation and scipy code for the birthday paradox and hash collisions
hashing  hashes  collisions  birthday-problem  birthday-paradox  coding  probability  statistics 
february 2016 by jm
Birthday problem calculator
I keep having to google this, so here's a good one which works -- unlike Wolfram Alpha!
birthday  birthday-paradox  birthday-problem  hashes  hash-collision  attacks  security  collisions  calculators  probability  statistcs 
december 2015 by jm
Trend Micro Locality Sensitive Hash
a fuzzy matching library. Given a byte stream with a minimum length
of 512 bytes, TLSH generates a hash value which can be used for similarity
comparisons. Similar objects will have similar hash values which allows for
the detection of similar objects by comparing their hash values. Note that
the byte stream should have a sufficient amount of complexity. For example,
a byte stream of identical bytes will not generate a hash value.

Paper here: https://drive.google.com/file/d/0B6FS3SVQ1i0GTXk5eDl3Y29QWlk/edit

via adulau
nilsimsa  sdhash  ssdeep  locality-sensitive  hashing  algorithm  hashes  trend-micro  tlsh  hash  fuzzy-matching  via:adulau 
may 2015 by jm
#BPjMleak
'Leak of the secret German Internet Censorship URL blacklist BPjM-Modul'.

Turns out there's a blocklist of adult-only or prohibited domains issued by a German government department, The Federal Department for Media Harmful to Young Persons (German: "Bundesprüfstelle für jugendgefährdende Medien" or BPjM), issued in the form of a list of hashes of those domains. These were extracted from an AVM router, then the hashes were brute forced using several other plaintext URL blocklists and domain lists.

Needless to say, there's an assortment of silly false positives, such as the listing of the website for the 1997 3D Realms game "Shadow Warrior": http://en.wikipedia.org/wiki/Shadow_Warrior
hashes  reversing  reverse-engineering  germany  german  bpjm  filtering  blocklists  blacklists  avm  domains  censorship  fps 
july 2014 by jm
Hey Judy, don't make it bad
Github get good results using Judy arrays to replace a Ruby hash. However: the whole blog post is a bit dodgy to me. It feels like there are much better ways to fix the problem:

1. the big one: don't do GC-heavy activity in the front-end web servers. Split that language-classification code into a separate service. Write its results to a cache and don't re-query needlessly.
2. why isn't this benchmarked against a C/C++ hash? it's only 36000 entries, loaded once at startup. lookups against that should be blisteringly fast even with the basic data structures, and that would also be outside the Ruby heap so avoid the GC overhead. Feels like the use of a Judy array was a "because I want to" decision.
3. personally, I'd have preferred they spend time fixing their uptime problems....

See also https://news.ycombinator.com/item?id=5639013 for more kvetching.
ruby  github  gc  judy-arrays  linguist  hashes  data-structures 
may 2013 by jm
Dropbox dedupe feature allows materialization of any file, if you know its hash
'allows users to exploit Dropbox’s file hashing scheme to copy files into their account without actually having them. Dropship will save the hashes of a file in JSON format. Anyone can then take these hashes and load the original file into their Dropbox account using Dropship.' heh. that sounds very familiar, I seem to recall thinking about this problem on several occasions... ;) Dropbox certainly didn't like it, going by this account
security  filesharing  dropbox  online-backup  online-storage  p2p  hashes  sha  dmca 
april 2011 by jm
Stop using unsafe keyed hashes, use HMAC
why HMAC is more secure than secret-suffix and secret-prefix keyed hashing. good to know
hmac  security  crypto  hashing  md5  hashes  sha256  sha1  from delicious
october 2009 by jm

Copy this bookmark:



description:


tags: