Tabula: Extract Tables from PDFs
Tabula is a tool for liberating data tables locked inside PDF files.
Data Mining OCR PDFs — Using pdftabextract to liberate tabular data from scanned documents | WZB Data Science Blog
During the last months I often had to deal with the problem of extracting tabular data from scanned documents. These documents included quite old sources like catalogs of German newspapers in the 1920s to 30s or newer sources like lists of schools in Germany from the 1990s. All sources were of mixed scanning quality (including rotated or skewed pages) and had very different table layouts. Some had visible table column borders, others only table header borders so the actual table cells were only visually separated by “white-space”. Automated data extraction with tools from ABBYY or using Tabula failed in most cases. Because of the big variety of scanning quality and table layouts, a general single-solution approach didn’t work out. Hence I created a set of common tools that allow to detect table layouts on scanned pages in OCR PDFs, enable visual verification of the detected layouts and finally allow the extraction of the data in the tables.
6 Practical Tips to Create a Stock Management Database in MS Access
In this article, we look at 6 practical tips that should be implemented when creating an MS Access Stock Management Database.
How to Batch Create Charts from Tables in Your Outlook Email
At times, you may need to create charts from all tables in your Outlook email. Thus, in this article, we will introduce you a quick method to get it in bulk.
rspivot: RStudio addin to view data frames as pivot tables
RStudio addin to view data frames as pivot tables. View data as values, growth rates, and shares.
