The spark.read.csv() method is used to read a single CSV or a directory of CSV files to a spark DataFrame. Various different options can be specified via the spark.read.option() method.
spark.read.option("option_name", "option_value").csv(file_path)
file_path: This is the CSV file to be read.This method returns a spark DataFrame.
Let’s look at the code below:
import pysparkfrom pyspark.sql import SparkSessionspark = SparkSession.builder.appName("answers").getOrCreate()path = "data.csv"df = spark.read.option("header",'True').option('delimiter', ',').csv(path)df.printSchema()
pyspark and SparkSession.SparkSession with the application name answers.csv() method. Multiple options are chained together using the option() method.Free Resources