There are multiple ways to add a prefix to all DataFrame column names in Pyspark. Here, we’ll discuss two method:
withColumnRenamed() methodtoDF() methodwithColumnRenamed() methodThe withColumnRenamed() method is used to rename the column names of a DataFrame.
To learn more about this method, refer to how to rename multiple columns in pyspark?
Let’s look at the code below:
import pysparkfrom pyspark.sql import SparkSessionspark = SparkSession.builder.appName('edpresso').getOrCreate()data = [("James","Smith","USA","CA"),("Michael","Rose","USA","NY"),("Robert","Williams","USA","CA"),("Maria","Jones","USA","FL")]columns = ["firstname","lastname","country","state"]df = spark.createDataFrame(data = data, schema = columns)print("Original dataframe:")df.show(truncate=False)prefix = "educative-"for column in df.columns:df = df.withColumnRenamed(column, prefix + column)print("-" * 8)print("Renamed dataframe:")df.show(truncate=False)
df.columns. Every column in the column list is prefixed with the prefix using the withColumnRenamed() method.toDF() methodThe toDF() method is used to return a new DataFrame with new column names.
DataFrame.toDF(*cols)
cols: There are the new column names.This method returns a new DataFrame.
Let’s look at the code below:
import pysparkfrom pyspark.sql import SparkSessionspark = SparkSession.builder.appName('edpresso').getOrCreate()data = [("James","Smith","USA","CA"),("Michael","Rose","USA","NY"),("Robert","Williams","USA","CA"),("Maria","Jones","USA","FL")]columns = ["firstname","lastname","country","state"]df = spark.createDataFrame(data = data, schema = columns)print("Original dataframe:")df.show(truncate=False)prefix = "educative-"new_cols = [prefix + column for column in df.columns]new_df = df.toDF(*new_cols)print("-" * 8)print("Renamed dataframe:")new_df.show(truncate=False)
prefix is created.toDF() method, passing the new list of column names.Free Resources