dedupe   220

« earlier    

GitHub - idealo/imagededup: 😎 Finding duplicate images made easy!
😎 Finding duplicate images made easy! Contribute to idealo/imagededup development by creating an account on GitHub.
image  deeplearning  python  dedupe 
9 days ago by sanderant
idealo/imagededup: 😎 Finding duplicate images made easy!
😎 Finding duplicate images made easy! Contribute to idealo/imagededup development by creating an account on GitHub.
image  tensorflow  python  deeplearning  dedupe  imageprocessing  images  dedup 
10 days ago by e2b
idealo/imagededup: 😎 Finding duplicate images made easy!
😎 Finding duplicate images made easy! Contribute to idealo/imagededup development by creating an account on GitHub.
deeplearning  github  image  python  dedupe  imageprocessing 
13 days ago by hay
GitHub - idealo/imagededup: 😎 Finding duplicate images made easy!
😎 Finding duplicate images made easy! Contribute to idealo/imagededup development by creating an account on GitHub.
python  image  dedupe  imageprocessing  deeplearning 
15 days ago by synergyfactor
0x90d/videoduplicatefinder: Video Duplicate Finder - Crossplatform
Video Duplicate Finder - Crossplatform. Contribute to 0x90d/videoduplicatefinder development by creating an account on GitHub.
dedupe  video  ffmpeg 
17 days ago by Peter_Antigen
idealo/imagededup: 😎 Finding duplicate images made easy!
😎 Finding duplicate images made easy! Contribute to idealo/imagededup development by creating an account on GitHub.
dedupe  tensorflow  python 
17 days ago by Peter_Antigen
Minnesota police officers convicted of serious crimes still on the job - StarTribune.com
behind the scenes from Nick Diachopoulos in CJR. // Unsupervised approaches to grouping or clustering can sometimes be made more efficient by providing targeted feedback to the machine-learning system. For instance, Dedupe, a tool for grouping and linking noisy records, has been used by investigative journalists at the Minneapolis StarTribune for its “Shielded by the Badge” series. Dedupe uses an approach called active learning. As the system tries to cluster items together, it asks for feedback from a human trainer on the items it’s least confident about. This maximizes the value of human feedback for improving the results over time.
unsupervisedmachinelearning  unsupervised  journalism  police  activelearning  clustering  feedback  startribune  dedupe  machinelearning  artificialIntelligence  maryjowebster 
may 2019 by fcoel
Deduplicating files in Public Git Archive · source{d} blog
This summer, we announced the release of Public Git Archive, a dataset with 3TB of Git data from the most starred repositories on GitHub. Now it’s time to tell how we tried to deduplicate files in the latest revision of the repositories in PGA using our research project for code deduplication, src-d/apollo. Before diving deep, let’s quickly see why we created it. To the best of our knowledge, the only efforts to detect code clones at massive scale have been made by Lopes et. al., who leveraged a huge corpus of over 428 million files in 4 languages to map code clones on GitHub (DéjàVu project). They relied on syntactic features, i.e. identifiers (my_list, your_list, …) and literals (if, for, …), to compute the similarity between a pair of files. PGA has fewer files in the latest (HEAD) revision - 54 million, and we did not want to give our readers a DéjàVu by repeating the same analysis. So we aimed at something different: not only copy-paste between files, but also involuntary rewrites of the same abstractions. Thus we extracted and used semantic features from Universal Abstract Syntax Trees.
cs  git  github  source  dedupe 
october 2018 by euler
restic: Fast, secure, efficient backup program
restic 0.9.2 has just been released! Included are many fixes and support for application keys:
github  pinboard-fixup-github-titles  Go  restic  backup  deduplication  dedupe  secure-by-default  from twitter_favs
august 2018 by suhlig

« earlier    

related tags

1.7  2012  2016  activelearning  administration  algorithm  algorithms  app  appliance  apps  arc  architecture  array  artificialintelligence  askubuntu  attic  author:dpc  aws  backup  backups  bktree  bloom  clean  cleaning  cleanup  clojure  cloud  clustering  comparison  compression  computervision  contacts  count  crashplan  cs  css  csv  cv  data  database  datamining  de-dupe  de  dedup  dedupliation  deduplicate  deduplication  deeplearning  dependencies  devops  dhs  disable  distcomp  distributed-systems  docs  document-similarity  documentation  dropbox  dup  dupe  dupes  duplicate  duplicates  duplication  encbup  encryption  entity  es2015  excel  exif  exiftool  feedback  ffmpeg  file  filemaker  files  filter  find_dupes  framework  freenas  freeware  fuse  fuzzylogic  fuzzymatch  fuzzymatching  get  git  github  glacier  go  golang  hack  harddrive  hash  hashing  hyperloglog  idempotence  image  imageprocessing  images  iphoto  itunes  javascript  jdupe  journalism  js  kafka  kind:library  ldap  ldif  learning  library  linux  mac  machine_learning  machinelearning  marketo  maryjowebster  masterdatamanagement  max  microsoft  missing  ml  new  node  node_modules  npm  nsq  obnam  opensource  oracle  perl  photos  pinboard-fixup-github-titles  pivot  police  postgres  powershell  programming  pubsub  py  python  ref  reference  resolution  restic  reverse-dedupe  reverse  rm  rmlint  rsync  rust  s3  salesforce  search  secure-by-default  security  sequence-numbers  serverless  similarity  simple  size  snapshot  software  source  spark  spark_app  speed  sql  startribune  storage  streaming  style  sysadmin  systems  tables  tensorflow  text-processing  thunderbird  tools  toread  transducers  twig  ubuntu  unique  unsupervised  unsupervisedmachinelearning  up  utilities  utility  veeam  video  web_tool  windows  xml  xslt  zfs  zimbra 

Copy this bookmark:



description:


tags: