How to convert a PySpark Dataframe to HTML

PySpark is an Apache Spark pOpen-source unified engine for processing large scale dataset.ython API, for computing large-data sets in real-time across a distributed environment.

Convert PySpark DataFrame to HTML

  1. Transform to Pandas DataFrame Convert PySpark DataFrame into Pandas DataFrame using the method toPandas(). The whole DataFrame in PySpark will be stored in the memory on the driver node.

import pandas as pd
pandas_df = pyspark_df.toPandas()
  1. Pandas DataFrame into HTML Using the to_html() method, we can convert the Pandas DataFrame into an HTML table. This generates an HTML string representation of the DataFrame.

html_tab = pandas_df.to_html()
  1. Save/display HTML According to your requirements, you can choose to save or display the HTML table.

# to save HTML table
with open('filename.html','w') as file:
file.write(html_table)
# to display HTML table
from IPython.display import display,HTML
display(HTML(html_table))

The above method copies the entire data into memory. If the DataFrame is too large to fit in memory, you can try sampling the data according to your requirements; create a sample DataFrame in PySpark, and repeat steps 1 to 4.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved