Python provides us with multiple libraries that provide the functionality to convert each page of a PDF into an image of different formats like PNG, JPG, and JPEG. In this Answer, we will explore the pdf2image
library to convert a PDF into an image.
pdf2image
pdf2image
converts a PDF into a
pip install pdf2image
Once we are done installing the pdf2image
library, we can write the Python code to convert our PDF into images. The code is given below:
from pdf2image import convert_from_path pdf_images = convert_from_path('my_pdf.pdf') for idx in range(len(pdf_images)): pdf_images[idx].save('pdf_page_'+ str(idx+1) +'.png', 'PNG') print("Successfully converted PDF to images")
As we run the code, the PDF is successfully converted to images, which are stored in the project folder. The explanation of the code is given below.
Line 1: We import convert_from_path
from the pdf2image
library.
Line 3: We use the convert_from_path
function and pass the path to the PDF as a string. The function converts the PDF images into a PIL image list, which we store in the pdf_images
variable.
Lines 5 and 6: We define a for loop that iterates through the number of PIL image objects stored in the pdf_images
variable. On each iteration, we use the save()
function of the PIL image objects and provide the name (first parameter) and image format (second parameter). In our case, we are saving each image with the name pdf_page_
appended with a number starting from 1 – to keep the names unique).
pdf2image
library makes it a seamless process to convert a PDF into images which may be used in OCR (optical character recognition) tasks and manipulating images.
Free Resources