**jm + edit-distance**
3

Cocktail similarity by Levenshtein distance

25 days ago by jm

Love it.

'I was recently figuring out a minimum-viable bar setup for making cocktails at home, and a system for memorizing & recording recipes. When I started writing down the first basic ingredients, I started noticing that cocktails are very close to each other - if you ignore fruit rinds and ice and such, an Americano is a Negroni with soda water instead of gin. An Old Fashioned is a Manhattan with sugar instead of vermouth. So I wondered - what’s a cocktail edit distance?'

edit-distance
levenshtein-distance
algorithms
visualization
cocktails
d3
recipes
booze
'I was recently figuring out a minimum-viable bar setup for making cocktails at home, and a system for memorizing & recording recipes. When I started writing down the first basic ingredients, I started noticing that cocktails are very close to each other - if you ignore fruit rinds and ice and such, an Americano is a Negroni with soda water instead of gin. An Old Fashioned is a Manhattan with sugar instead of vermouth. So I wondered - what’s a cocktail edit distance?'

25 days ago by jm

Levenshtein automata can be simple and fast

june 2015 by jm

Nice algorithm for fuzzy text search with a limited Levenshtein edit distance using a DFA

dfa
algorithms
levenshtein
text
edit-distance
fuzzy-search
search
python
june 2015 by jm

Harry - A Tool for Measuring String Similarity

via kragen.
via:kragen
strings
similarity
levenshtein-distance
algorithms
openmp
jaro-winkler
edit-distance
cli
commandline
hamming-distance
compression

january 2014 by jm

a small tool for comparing strings and measuring their similarity. The tool supports several common distance and kernel functions for strings as well as some exotic similarity measures. The focus of Harry lies on implicit similarity measures, that is, comparison functions that do not give rise to an explicit vector space. Examples of such similarity measures are the Levenshtein distance and the Jaro-Winkler distance.

For comparison Harry loads a set of strings from input, computes the specified similarity measure and writes a matrix of similarity values to output. The similarity measure can be computed based on the granularity of characters as well as words contained in the strings. The configuration of this process, such as the input format, the similarity measure and the output format, are specified in a configuration file and can be additionally refined using command-line options.

Harry is implemented using OpenMP, such that the computation time for a set of strings scales linear with the number of available CPU cores. Moreover, efficient implementations of several similarity measures, effective caching of similarity values and low-overhead locking further speedup the computation.

via kragen.

january 2014 by jm

Copy this bookmark: