jm + similarity   6

Image comparison algorithms
Awesome StackOverflow answer for detecting "similar" images -- promising approach to reimplement ffffound's similarity feature in mltshp, maybe
algorithms  hashing  comparison  diff  images  similarity  search  ffffound  mltshp 
17 days ago by jm
VividCortex uses K-Means Clustering to discover related metrics
After selecting an interesting spike in a metric, the algorithm can automate picking out a selection of other metrics which spiked at the same time. I can see that being pretty damn useful
metrics  k-means-clustering  clustering  algorithms  discovery  similarity  vividcortex  analysis  data 
march 2015 by jm
Harry - A Tool for Measuring String Similarity
a small tool for comparing strings and measuring their similarity. The tool supports several common distance and kernel functions for strings as well as some exotic similarity measures. The focus of Harry lies on implicit similarity measures, that is, comparison functions that do not give rise to an explicit vector space. Examples of such similarity measures are the Levenshtein distance and the Jaro-Winkler distance.
For comparison Harry loads a set of strings from input, computes the specified similarity measure and writes a matrix of similarity values to output. The similarity measure can be computed based on the granularity of characters as well as words contained in the strings. The configuration of this process, such as the input format, the similarity measure and the output format, are specified in a configuration file and can be additionally refined using command-line options.
Harry is implemented using OpenMP, such that the computation time for a set of strings scales linear with the number of available CPU cores. Moreover, efficient implementations of several similarity measures, effective caching of similarity values and low-overhead locking further speedup the computation.

via kragen.
via:kragen  strings  similarity  levenshtein-distance  algorithms  openmp  jaro-winkler  edit-distance  cli  commandline  hamming-distance  compression 
january 2014 by jm
Fred's ImageMagick Scripts: SIMILAR
compute an image-similarity metric, to discover mostly-identical-but-slightly-tweaked images:
SIMILAR computes the normalized cross correlation similarity metric between two equal dimensioned images. The normalized cross correlation metric measures how similar two images are, not how different they are. The range of ncc metric values is between 0 (dissimilar) and 1 (similar). If mode=g, then the two images will be converted to grayscale. If mode=rgb, then the two images first will be converted to colorspace=rgb. Next, the ncc similarity metric will be computed for each channel. Finally, they will be combined into an rms value.

(via Dan O'Neill)
image  photos  pictures  similar  imagemagick  via:dano  metrics  similarity 
april 2013 by jm
feedback loop n-gram analyzer
'a simple parser of ARF compliant FBL complaints, which normalizes the email complaints and generates a 6-tuple n-gram version of the message. These n-grams are stored in a Redis database, keyed by the file in which they can be found. An inverse index also exists that allow you to find all messages containing a particular n-gram word.'
anti-spam  spam  fbl  feedback  filtering  n-grams  similarity  hashing  redis  searching 
september 2011 by jm

Copy this bookmark: