jm + fuzzy-matching   3

Trend Micro Locality Sensitive Hash
a fuzzy matching library. Given a byte stream with a minimum length
of 512 bytes, TLSH generates a hash value which can be used for similarity
comparisons. Similar objects will have similar hash values which allows for
the detection of similar objects by comparing their hash values. Note that
the byte stream should have a sufficient amount of complexity. For example,
a byte stream of identical bytes will not generate a hash value.

Paper here: https://drive.google.com/file/d/0B6FS3SVQ1i0GTXk5eDl3Y29QWlk/edit

via adulau
nilsimsa  sdhash  ssdeep  locality-sensitive  hashing  algorithm  hashes  trend-micro  tlsh  hash  fuzzy-matching  via:adulau 
may 2015 by jm
MinHash for dummies
A Java-oriented practical intro to the MinHash duplicate-detection shingling algo
shingling  algorithms  minhash  hashing  duplicates  duplicate-detection  fuzzy-matching  java 
august 2014 by jm
Lucene 4 - Revisiting Problems For Speed [slides]
a Presentation from Simon Willnauer on optimization work performed on Lucene in 2011. The most interesting stuff here is the work done to replace an O(n^2) FuzzyQuery fuzzy-match algorithm with a FSM trie is extremely cool -- benchmarked at 214 times faster!
benchmarks  slides  lucene  search  fuzzy-matching  text-matching  strings  algorithms  coding  fsm  tries 
april 2013 by jm

Copy this bookmark:



description:


tags: