jm + scanning + hadoop   1

How-to: Index Scanned PDFs at Scale Using Fewer Than 50 Lines of Code
using Spark, Tesseract, HBase, Solr and Leptonica. Actually pretty feasible
spark  tesseract  hbase  solr  leptonica  pdfs  scanning  cloudera  hadoop  architecture 
october 2015 by jm

Copy this bookmark:



description:


tags: