jm + solr   3

How-to: Index Scanned PDFs at Scale Using Fewer Than 50 Lines of Code
using Spark, Tesseract, HBase, Solr and Leptonica. Actually pretty feasible
spark  tesseract  hbase  solr  leptonica  pdfs  scanning  cloudera  hadoop  architecture 
october 2015 by jm
Turbocharging Solr Index Replication with BitTorrent
Etsy now replicating their multi-GB search index across the search farm using BitTorrent. Why not Multicast? 'multicast rsync caused an epic failure for our network, killing the entire site for several minutes. The multicast traffic saturated the CPU on our core switches causing all of Etsy to be unreachable.' fun!
etsy  multicast  sev1  bittorrent  search  solr  rsync  scaling  outages 
february 2012 by jm

Copy this bookmark:



description:


tags: