similarity   1496

« earlier    

What is a near-dupe, really? | Clustify Blog – eDiscovery, Document Clustering, Technology-Assisted Review (Predictive Coding), Information Retrieval, and Software Development
>This article looks at three reasonable, but different, ways of defining the near-dupe similarity between two documents. It also explains the popular MinHash algorithm, and shows that its results may surprise you in some circumstances.document_comparison_toolNear-duplicates are documents that are nearly, but not exactly, the same. They could be different revisions of a memo where a few typos were fixed or a few sentences were added. They could be an original email and a reply that quotes the original and adds a few sentences. They could be a Microsoft Word document and a printout of the same document that was scanned and OCRed with a few words not matching due to OCR errors.
similarity  data 
5 weeks ago by sharon_howard
That Looks Oddly Familiar — Jan Stępień
Perceptual hashing is a fascinating technique of summarising media files. It has little in common with cryptographic hashes such as SHA1. Two input files which look similar will end up having different cryptographic yet similar perceptual hashes. And by similar, we mean having most bits set the same way.

In this talk we'll combine pHash and a BK-tree to efficiently search through metric spaces of perceptual hashes. We will use Rust to implement a simple command line tool. It will ...
slides  phash  image  similarity  bktree 
7 weeks ago by badboy
Vector and Line Quantization for Billion-scale Similarity Search on GPUs |
Hrm, I wonder if this hierarchical inverted index structure could be applied to specialist database software generally, or perhaps a general GPU database?
similarity  search  image  video  GPU  deep  machine  learning  approximate  nearest  neighbor 
january 2019 by asteroza
Why Would a Java Engineer Love Frontend Development?
It often happens that backend developers don’t like working with a frontend. Even more, some hate frontend development. The complaints are always the same: J...
angularjs  java  similarity  webapplication  framework 
december 2018 by gilberto5757
Similarity search (implemented in Python)
it takes an input of a list of sets, and output pairs that meet the similarity threshold
python  similarity  programs  pairing 
november 2018 by GreggInCA

« earlier    

related tags

2000  2010  2014  accessibility  accuracy  acmtariat  aesthetics  ai-control  ai  aknn  algorithm  algorithms  als  alternative  analogy  analysis  analyzer  and  anglo  angularjs  annoy  approximate  approximation  approxnearestneighbors  approxsimilarityjoin  art  article  atoms  authorship_analysis  big-peeps  bktree  blog:  bm25  brands  can  cardinality  cbir  character  chart  clever-rats  clojure  clustering  cnn  co-click  coarse-fine  code  coding  coequality  cog-psych  cognition  collaborative  color  comparison  compatibility  computer-science  computer-vision  containment  content-based-image-retrieval  continuation  cosine-distance  cosine-similarity  cosine  cost-benefit  critique  crux  culture  data  database  datamining  dataset  datasets  debate  deep-learning  deep  deepgoog  definition  detection  development  developmental  direction  discrete  discussion  distance  diversity  dl  document  duplicate-content  duplicate  duplication  early-modern  ebay  ecology  economics  eddiewoo  elasticsearch  elmo  embedding  embeddings  emd  empirical  engineering  enlightenment-renaissance-restoration-reformation  enology  error  essay  euclidean  europe  evaluation  evolution  exocortex  experiment  expert-experience  exploratory  facebook  faiss  favorites  figures  floss  forensic  framework  frontier  functions  fuzzy  fuzzyhash  games  generalization?  generative  generator  gensim  germanic  github  glove  go  golang  google  gpu  gradient-descent  hash  hashing  help  heuristic  history  homo-hetero  how  humanity  image-hash  image-hashing  image  images  implication  important  indexing  infersent  information  insight  instagram  internet  intricacy  ir  iteration-recursion  jaccard  java  javascript  job  js  keras  land  language  latent-variables  lda  learning  lesswrong  levenshtein  levenstein  lexical  lexicons  libraries  library  linearity  links  lisp  logs  lsh  lstm  lucene  machine-learning  machine  machinelearning  making  mapping  maps  marginal  markets  matching  math  math10-3  matrix-factorization  measures  meme  memetics  meta:prediction  metrics  mining  ml  models  moz  multi-modal  multi  music  musicology  nature  nearest-neighbors  nearest-neighbour  nearest  neighbor  network-structure  network  neural-net  neural-networks  neural  neuro-nitgrit  neuro  neurons  nibble  nitty-gritty  nlp  nlproc  nltk  notebook  novelty  number  object  opensource  org:bleg  pairing  papers  pattern  performance-measure  phash  philosophy  phonetic  photos  pitch  plagiarism  predictive-processing  programming  programs  project  projects  proximity  psychology  python  pytorch  q&a  query  quora  random  rather-interesting  ratty  recomendation  recommendation  recommendations  recsys  reddit  reduction  reference  regularization  reinforcement  replicator  reproduction  resumes  reviews  rhetoric  richarddawkins  risk  ruby  sam  science  seach  search  searching  segmentation  semantic  semantics  sentence-embedding  sentence-embeddings  sentence-vector  sentence  sequence  setsimilarity  shingling  shiny  siamese-networks  siamese  simhash  simian  similar  sketch  sleuthin  slides  smalltalk  smoothness  social  software  solr  songs  sound  soundex  spark  speedometer  spotify  ssc  state-of-art  statistics  string  strings  study  supply-demand  survey  symmetry  systematic_review  t-sne  techtariat  term_vectors  text  textanalysis  tf-idf  tfidf  thesis  thinking  tika  time  tip-of-tongue  tlsh  to-understand  to-write-about  todo  tool  tools  track-record  training  transmission  triplet  trump  tumblr  understand  unicode  unit  unsupervised  us...  us  vector  video  webapplication  weird  wine  wire-guided  wmd  word-embedding  word  word2vec  words  writing  yvain   

Copy this bookmark: