How to save a PySpark DataFrame to a CSV file

The df.write.csv() method is used to write a DataFrame to a CSV file. Various different options related to the write operation can be specified via the df.write.option() method.

Syntax

df.write.option("option_name", "option_value").csv(file_path)

Parameter

  • file_path: Denotes the path where the csv file to be created.

Example

import pyspark, os
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('answer').getOrCreate()

data = [("James","Educative","Engg","USA"),
    ("Michael","Google",None,"Asia"),
    ("Robert",None,"Marketing","Russia"),
    ("Maria","Netflix","Finance","Ukraine"),
    (None, None, None, None)
  ]

columns = ["emp name","company","department","country"]
df = spark.createDataFrame(data = data, schema = columns)

csv_file_path = "data.csv"
df.write.option("header", True).option("delimiter",",").csv(csv_file_path)
Code

Follow the instructions mentioned below to inspect the generated CSV file.

  • Use the ls command to view the data.csv directory.
  • Use the cd data.csv command to view the generated .csv file.
  • Use the ls command to view the generated .csv file.
  • To inspect the data contained in the generated file, use the cat command.
  • Use the cat *.csv syntax. The * sign denotes the filename with a .csv extension. We may copy and paste the filename here.

Explanation

  • Lines 1–2: The pyspark DataFrame and SparkSession is imported.
  • Line 4: We create a SparkSession with the application name answer.
  • Lines 6–11: We define the dummy data for the DataFrame.
  • Line 13: We define the columns for the dummy data.
  • Line 14: We create a spark DataFrame with the dummy data defined above.
  • Line 16: The CSV file path where the CSV file to be generated is defined.
  • Line 17: The DataFrame is written to a CSV file by invoking the write.csv() function on the DataFrame object.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved