There are multiple ways to add a prefix to all DataFrame column names in Pyspark. Here, we’ll discuss two method:
withColumnRenamed()
methodtoDF()
methodwithColumnRenamed()
methodThe withColumnRenamed()
method is used to rename the column names of a DataFrame.
To learn more about this method, refer to how to rename multiple columns in pyspark?
Let’s look at the code below:
import pysparkfrom pyspark.sql import SparkSessionspark = SparkSession.builder.appName('edpresso').getOrCreate()data = [("James","Smith","USA","CA"),("Michael","Rose","USA","NY"),("Robert","Williams","USA","CA"),("Maria","Jones","USA","FL")]columns = ["firstname","lastname","country","state"]df = spark.createDataFrame(data = data, schema = columns)print("Original dataframe:")df.show(truncate=False)prefix = "educative-"for column in df.columns:df = df.withColumnRenamed(column, prefix + column)print("-" * 8)print("Renamed dataframe:")df.show(truncate=False)
df.columns
. Every column in the column list is prefixed with the prefix
using the withColumnRenamed()
method.toDF()
methodThe toDF()
method is used to return a new DataFrame with new column names.
DataFrame.toDF(*cols)
cols
: There are the new column names.This method returns a new DataFrame.
Let’s look at the code below:
import pysparkfrom pyspark.sql import SparkSessionspark = SparkSession.builder.appName('edpresso').getOrCreate()data = [("James","Smith","USA","CA"),("Michael","Rose","USA","NY"),("Robert","Williams","USA","CA"),("Maria","Jones","USA","FL")]columns = ["firstname","lastname","country","state"]df = spark.createDataFrame(data = data, schema = columns)print("Original dataframe:")df.show(truncate=False)prefix = "educative-"new_cols = [prefix + column for column in df.columns]new_df = df.toDF(*new_cols)print("-" * 8)print("Renamed dataframe:")new_df.show(truncate=False)
prefix
is created.toDF()
method, passing the new list of column names.Free Resources