Copy this bookmark:



description:


tags:



bookmark detail

How-to: Index Scanned PDFs at Scale Using Fewer Than 50 Lines of Code
using Spark, Tesseract, HBase, Solr and Leptonica. Actually pretty feasible
spark  tesseract  hbase  solr  leptonica  pdfs  scanning  cloudera  hadoop  architecture 
october 2015 by jm
view in context