What is Camelot?

Camelot is a Python library that allows for the easy extraction of data tables from PDFPortable Document Format files. However, this library only works with text-based PDFswhich allow data to be selected through cursor and not with scanned PDFs.

Camelot has multiple settings that can be tweaked for better extraction of data in tables. This offers more control over the extraction process than other available libraries.

Camelot converts data tables in PDFs to dataframes.

Installation

The following command installs Camelot:

pip install "camelot-py[cv]"

Usage

Below is a basic code that reads a table in a PDF and converts it to a CSV and Dataframe:

import camelot
table = camelot.read_pdf('sample.pdf')
print(table)  // Displays the number of tables read from the pdf

table.export('sample_table.csv', f='csv', compress=True) // saves the table to a csv file.

table[0] // displays the table shape

tables[0].df // converts the table to a dataframe.

New on Educative
Learn any Language for FREE all September 🎉
For the entire month of September, get unlimited access to our entire catalog of beginner coding resources.
🎁 G i v e a w a y
30 Days of Code
Complete Educative’s daily coding challenge every day in September, and win exciting Prizes.

Free Resources