The spark.read.csv()
method is used to read a single CSV or a directory of CSV files to a spark DataFrame. Various different options can be specified via the spark.read.option()
method.
spark.read.option("option_name", "option_value").csv(file_path)
file_path
: This is the CSV file to be read.This method returns a spark DataFrame.
Let’s look at the code below:
import pysparkfrom pyspark.sql import SparkSessionspark = SparkSession.builder.appName("answers").getOrCreate()path = "data.csv"df = spark.read.option("header",'True').option('delimiter', ',').csv(path)df.printSchema()
pyspark
and SparkSession
.SparkSession
with the application name answers
.csv()
method. Multiple options are chained together using the option()
method.Free Resources