How to add a prefix to all Spark DataFrame column names

There are multiple ways to add a prefix to all DataFrame column names in Pyspark. Here, we’ll discuss two method:

  1. The withColumnRenamed() method
  2. The toDF() method

The withColumnRenamed() method

The withColumnRenamed() method is used to rename the column names of a DataFrame.

To learn more about this method, refer to how to rename multiple columns in pyspark?

Code example

Let’s look at the code below:

import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('edpresso').getOrCreate()
data = [("James","Smith","USA","CA"),
("Michael","Rose","USA","NY"),
("Robert","Williams","USA","CA"),
("Maria","Jones","USA","FL")
]
columns = ["firstname","lastname","country","state"]
df = spark.createDataFrame(data = data, schema = columns)
print("Original dataframe:")
df.show(truncate=False)
prefix = "educative-"
for column in df.columns:
df = df.withColumnRenamed(column, prefix + column)
print("-" * 8)
print("Renamed dataframe:")
df.show(truncate=False)

Code explanation

  • Line 4: We create a Spark session with the app’s Educative Answers.
  • Lines 6–10: We define data for the DataFrame.
  • Line 12: The columns of the DataFrame are defined.
  • Line 13: A DataFrame is created using the createDataframe() method.
  • Line 15: The original DataFrame is printed.
  • Line 17: The prefix to be added is defined.
  • Lines 18-19: The list of the DataFrame columns is obtained using df.columns. Every column in the column list is prefixed with the prefix using the withColumnRenamed() method.
  • Line 23: The new DataFrame with new column names is printed.

The toDF() method

The toDF() method is used to return a new DataFrame with new column names.

Syntax

DataFrame.toDF(*cols)

Parameter

  • cols: There are the new column names.

Return value

This method returns a new DataFrame.

Code example

Let’s look at the code below:

import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('edpresso').getOrCreate()
data = [("James","Smith","USA","CA"),
("Michael","Rose","USA","NY"),
("Robert","Williams","USA","CA"),
("Maria","Jones","USA","FL")
]
columns = ["firstname","lastname","country","state"]
df = spark.createDataFrame(data = data, schema = columns)
print("Original dataframe:")
df.show(truncate=False)
prefix = "educative-"
new_cols = [prefix + column for column in df.columns]
new_df = df.toDF(*new_cols)
print("-" * 8)
print("Renamed dataframe:")
new_df.show(truncate=False)

Code explanation

  • Line 4: We create a spark session with the app’s Educative Answers.
  • Lines 6–10: We define data for the DataFrame.
  • Line 12: The columns of the DataFrame are defined.
  • Line 13: A DataFrame is created using the createDataframe() method.
  • Line 15: The original DataFrame is printed.
  • Line 17: The prefix to be added is defined.
  • Lines 18: A new list of column names prefixed with the prefix is created.
  • Line 20: A new DataFrame where every column is prefixed is obtained using the toDF() method, passing the new list of column names.
  • Line 23: The new DataFrame with new column names is printed.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved