PySpark is an
Transform to Pandas DataFrame Convert PySpark DataFrame into Pandas DataFrame using the method toPandas()
. The whole DataFrame in PySpark will be stored in the memory on the driver node.
import pandas as pdpandas_df = pyspark_df.toPandas()
Pandas DataFrame into HTML Using the to_html()
method, we can convert the Pandas DataFrame into an HTML table. This generates an HTML string representation of the DataFrame.
html_tab = pandas_df.to_html()
Save/display HTML According to your requirements, you can choose to save or display the HTML table.
# to save HTML tablewith open('filename.html','w') as file:file.write(html_table)# to display HTML tablefrom IPython.display import display,HTMLdisplay(HTML(html_table))
The above method copies the entire data into memory. If the DataFrame is too large to fit in memory, you can try sampling the data according to your requirements; create a sample DataFrame in PySpark, and repeat steps 1 to 4.
Free Resources