How to import a CSV file in pyspark

The spark.read.csv() method is used to read a single CSV or a directory of CSV files to a spark DataFrame. Various different options can be specified via the spark.read.option() method.

Syntax

spark.read.option("option_name", "option_value").csv(file_path)

Parameter

  • file_path: This is the CSV file to be read.

Return value

This method returns a spark DataFrame.

Code example

Let’s look at the code below:

main.py
data.csv
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("answers").getOrCreate()
path = "data.csv"
df = spark.read.option("header",'True').option('delimiter', ',').csv(path)
df.printSchema()

Code explanation

  • Lines 1–2: We import pyspark and SparkSession.
  • Line 4: We create SparkSession with the application name answers.
  • Line 6: We define the path to the CSV file.
  • Line 8: We convert the CSV file to a DataFrame using the csv() method. Multiple options are chained together using the option() method.
  • Line 9: We print the DataFrame schema.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved