ocr   9034

« earlier    

tesseract-ocr/tesseract: Tesseract Open Source OCR Engine (main repository)
This package contains an OCR engine - libtesseract and a command line program - tesseract.

The lead developer is Ray Smith. The maintainer is Zdenko Podobny. For a list of contributors see AUTHORS and GitHub's log of contributors.

Tesseract has unicode (UTF-8) support, and can recognize more than 100 languages "out of the box".

Tesseract supports various output formats: plain-text, hocr(html), pdf, tsv, invisible-text-only pdf.

You should note that in many cases, in order to get better OCR results, you'll need to improve the quality of the image you are giving Tesseract.

This project does not include a GUI application. If you need one, please see the 3rdParty wiki page.

Tesseract can be trained to recognize other languages. See Tesseract Training for more information.
OCR  text  opensource 
5 days ago by horshacktest
Examining the Impact of Artificial Intelligence in Museums – MW17: Museums and the Web 2017
Artificial Intelligence. It’s a concept that holds lots of promise, generates endless buzz, and is starting to make its way into everyday life. In 2015, artificial intelligence went mainstream, and undoubtedly, in 2016, we will begin to see an increase in experimentation within the cultural space. In this presentation, we’ll explore some of AI’s most powerful uses related to machine learning and its impact on galleries, libraries, archives, and museums in the areas of collections, ticketing, and attendance data. We’ll also examine machine vision; a computer’s ability to understand what it is seeing. Machine vision can be used to inspect and analyze images. Imagine being able to classify all of your visual objects with the flip of a switch (actually, a few lines of code). We’ll explore real examples of machine learning on the following topics: -Identifying subject matter -Exacting color composition -Sentiment analysis -Text/character recognition -Recognizing similarity and patterns -Art authentication Machine learning and vision are very powerful tools and are more accessible than ever before. In the hands of museums, these technologies will inevitably lead to interesting discoveries, rich data, and new paths into your collection.
mw2017  artificial_intelligence  image_recognition  text_mining  ocr 
8 days ago by stacker
pdf ocr for table and data
pdf  ocr 
10 days ago by lenciel
skylander86/lambda-text-extractor: AWS Lambda functions to extract text from various binary formats.
GitHub is where people build software. More than 28 million people use GitHub to discover, fork, and contribute to over 85 million projects.
aws  lambda  pdf  text  extraction  ocr  tesseract 
16 days ago by floehopper
Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning | Dropbox Tech Blog
“To address this shortcoming, we eventually tracked down a font vendor in China who could provide us with representative ancient thermal printer fonts.” - via Ben Hammersley
china  mechanicalturk  dropbox  ocr  font  thermalprinter 
22 days ago by danhon

« earlier    

related tags

ai  android  api  app  application  archiviazione  artificial_intelligence  attention  audio  aws  barcode  business  china  chinese  clever  cli  cnk  cnn  code  commandline  computer_vision  convert  cryptography  ctc  cuneiform  data-mining  data  datahoarding  dataset  dataviz  deep-learning  deep_learning  detection  devonthink  digitalization  display  documentation  documents  dropbox  encryption  eval  extraction  face-detection  files  finance  font  fontcode  free  geo  gifs  golang  google  graphing  graphs  gui  hadoop  handwriting  history  howto  image_recognition  images  ios  itineraries  japanese  keras  knowledgemanagement  lambda  learning  library  libri  linux  mac  machine  machine_learning  machinelearning  manuscripts  mechanicalturk  memes  microservices  ml  mobile  mw2017  nlp  ocrpreprocessing  office  online  opencv  opensource  organization  paper  pdf  photo  preprocessing  privacy  programming  pyimagesearch  python  raspberrypi  readiris  recognise  recognition  rnn  roipool  sapml  scansnap  scene  screenshot  script  search  selfhost  serverless  service  steganography  synology  synth  tesseract  text  text_mining  text_recognition  textrecognition  texts  thermalprinter  tool  tools  toughest  transcription  uber  unicode  utility  uwp  video  web-scraper  web  webservice 

Copy this bookmark: