How to write a DataFrame to a Parquet file in Python

Overview

Apache Parquet is a column-oriented, open-source data file format for data storage and retrieval. It offers high-performance data compression and encoding schemes to handle large amounts of complex data.

We use the to_parquet() method in Python to write a DataFrame to a Parquet file.

Note: Refer to What is pandas in Python? to learn more about pandas.

Syntax

DataFrame.to_parquet(path=None, engine='auto', compression='snappy', index=None, partition_cols=None, storage_options=None, **kwargs)

Parameters

  • path: This is the path to the Parquet file.
  • engine: This parameter indicates which Parquet library to use. The available options are auto, pyarrow, and fastparquet.
  • compression: This parameter indicates the type of compression to use. The available options are snappy, gzip, and brotli. The default compression is snappy.
  • index: This is a boolean parameter. If True, the DataFrame’s indexes are written to the file. If False, the indexes are ignored.
  • partition_cols: These are the names of the columns that partition the DataFrame. The order in which the columns are given determines the order in which they are partitioned.
  • storage_options: These are the extra options for a certain storage connection, such as a host, port, username, password, and so on.

Example

import pandas as pd
import os
data = [['dom', 10], ['abhi', 15], ['celeste', 14]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
df.to_parquet("dataframe.parquet")
print("Listing the contents of the current directory:")
print(os.listdir('.'))

Explanation

  • Lines 1–2: We import the pandas and os packages.
  • Line 4: We define the data for constructing the pandas dataframe.
  • Line 6: We convert data to a pandas DataFrame called df.
  • Line 8: We write df to a Parquet file using the to_parquet() function. The resulting file name as dataframe.parquet.
  • Lines 10–11: We list the items in the current directory using the os.listdir method. We observe that the dataframe.parquet file is created.

Free Resources