The withColumnRenamed()
method is used to rename an existing column. The method returns a new DataFrame with the newly named column. Multiple columns in a DataFrame can be renamed by chaining the withColumnRenamed()
method for each column.
DataFrame.withColumnRenamed(existing, new)
existing
: This is the name of the existing column.new
: This is the new name to be given to the existing column.A new DataFrame is generated with the renamed columns.
Let’s look at the code below:
import pysparkfrom pyspark.sql import SparkSessionspark = SparkSession.builder.appName('edpresso').getOrCreate()data = [("James","Smith","USA","CA"),("Michael","Rose","USA","NY"),("Robert","Williams","USA","CA"),("Maria","Jones","USA","FL")]columns = ["firstname","lastname","country","state"]df = spark.createDataFrame(data = data, schema = columns)print("Original dataframe:")df.show(truncate=False)new_df = df.withColumnRenamed("firstname", "First-Name") \.withColumnRenamed("lastname", "Last-Name") \.withColumnRenamed("country", "Country")print("Renamed dataframe:")new_df.show(truncate=False)
Note: Please scroll down the output window to view the entire output.
pyspark
and SparkSession
.edpresso
is created.createDataframe()
method.withColumnRenamed()
method.Free Resources