Tabula alternatives and similar libraries
Based on the "PDF" category.
Alternatively, view Tabula alternatives based on common mentions on social networks and blogs.
-
OpenPDF
OpenPDF is a free Java library for creating and editing PDF files with a LGPL and MPL open source license. OpenPDF is based on a fork of iText. We welcome contributions from other developers. Please feel free to submit pull-requests and bugreports to this GitHub repository. ⛺ -
Open HTML to PDF
An HTML to PDF library for the JVM. Based on Flying Saucer and Apache PDF-BOX 2. With SVG image support. Now also with accessible PDF support (WCAG, Section 508, PDF/UA)! -
Eclipse BIRT
Report engine for creating PDF and other formats (DOCX, XLSX, HTML, etc) using Eclipse-based visual editor.
Write Clean Java Code. Always.
Do you think we are missing an alternative of Tabula or a related project?
README
tabula-java 
tabula-java
is a library for extracting tables from PDF files — it is the table extraction engine that powers Tabula (repo). You can use tabula-java
as a command-line tool to programmatically extract tables from PDFs.
© 2014-2020 Manuel Aristarán. Available under MIT License. See [LICENSE
](LICENSE).
Download
Download a version of the tabula-java's jar, with all dependencies included, that works on Mac, Windows and Linux from our [releases page](../../releases).
Usage Examples
tabula-java
provides a command line application:
$ java -jar target/tabula-1.0.5-jar-with-dependencies.jar --help
usage: tabula [-a <AREA>] [-b <DIRECTORY>] [-c <COLUMNS>] [-f <FORMAT>]
[-g] [-h] [-i] [-l] [-n] [-o <OUTFILE>] [-p <PAGES>] [-r] [-s
<PASSWORD>] [-t] [-u] [-v]
Tabula helps you extract tables from PDFs
-a,--area <AREA> -a/--area = Portion of the page to analyze.
Example: --area 269.875,12.75,790.5,561.
Accepts top,left,bottom,right i.e. y1,x1,y2,x2
where all values are in points relative to the
top left corner. If all values are between
0-100 (inclusive) and preceded by '%', input
will be taken as % of actual height or width
of the page. Example: --area %0,0,100,50. To
specify multiple areas, -a option should be
repeated. Default is entire page
-b,--batch <DIRECTORY> Convert all .pdfs in the provided directory.
-c,--columns <COLUMNS> X coordinates of column boundaries. Example
--columns 10.1,20.2,30.3. If all values are
between 0-100 (inclusive) and preceded by '%',
input will be taken as % of actual width of
the page. Example: --columns %25,50,80.6
-f,--format <FORMAT> Output format: (CSV,TSV,JSON). Default: CSV
-g,--guess Guess the portion of the page to analyze per
page.
-h,--help Print this help text.
-i,--silent Suppress all stderr output.
-l,--lattice Force PDF to be extracted using lattice-mode
extraction (if there are ruling lines
separating each cell, as in a PDF of an Excel
spreadsheet)
-n,--no-spreadsheet [Deprecated in favor of -t/--stream] Force PDF
not to be extracted using spreadsheet-style
extraction (if there are no ruling lines
separating each cell)
-o,--outfile <OUTFILE> Write output to <file> instead of STDOUT.
Default: -
-p,--pages <PAGES> Comma separated list of ranges, or all.
Examples: --pages 1-3,5-7, --pages 3 or
--pages all. Default is --pages 1
-r,--spreadsheet [Deprecated in favor of -l/--lattice] Force
PDF to be extracted using spreadsheet-style
extraction (if there are ruling lines
separating each cell, as in a PDF of an Excel
spreadsheet)
-s,--password <PASSWORD> Password to decrypt document. Default is empty
-t,--stream Force PDF to be extracted using stream-mode
extraction (if there are no ruling lines
separating each cell)
-u,--use-line-returns Use embedded line returns in cells. (Only in
spreadsheet mode.)
-v,--version Print version and exit.
It also includes a debugging tool, run java -cp ./target/tabula-1.0.5-jar-with-dependencies.jar technology.tabula.debug.Debug -h
for the available options.
You can also integrate tabula-java
with any JVM language. For Java examples, see the [tests
](src/test/java/technology/tabula/) folder.
JVM start-up time is a lot of the cost of the tabula
command, so if you're trying to extract many tables from PDFs, you have a few options for speeding it up:
- the -b option, which allows you to convert all pdfs in a given directory
- the drip utility
- the Ruby, Python, R, and Node.js bindings
- writing your own program in any JVM language (Java, JRuby, Scala) that imports tabula-java.
- waiting for us to implement an API/server-style system (it's on the roadmap)
Building from Source
Clone this repo and run:
mvn clean compile assembly:single
Contributing
Interested in helping out? We'd love to have your help!
You can help by:
- Reporting a bug.
- Adding or editing documentation.
- Contributing code via a Pull Request.
- Spreading the word about
tabula-java
to people who might be able to benefit from using it.
Backers
You can also support our continued work on tabula-java
with a one-time or monthly donation on OpenCollective. Organizations who use tabula-java
can also sponsor the project for acknowledgement on our official site and this README.
Special thanks to the following users and organizations for generously supporting Tabula with donations and grants:
*Note that all licence references and agreements mentioned in the Tabula README section above
are relevant to that project's source code only.