How to add a current timestamp column to pyspark DataFrame

The current timestamp can be added as a new column to spark Dataframe using the current_timestamp() function of the sql module in pyspark.

The method returns the timestamp in the yyyy-mm-dd hh:mm:ss. nnn format.

Syntax

pyspark.sql.functions.current_timestamp()

Parameters

This method has no parameters.

Return value

This method returns the current timestamp.

Code example

Let’s see the code below:

import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import current_timestamp
spark = SparkSession.builder.appName('edpresso').getOrCreate()
data = [("James","Smith","USA","CA"),
    ("Michael","Rose","USA","NY"),
    ("Robert","Williams","USA","CA"),
    ("Maria","Jones","USA","FL")
  ]
columns = ["firstname","lastname","country","state"]
df = spark.createDataFrame(data = data, schema = columns)
df_with_ts = df.withColumn("curr_timestamp", current_timestamp())
df_with_ts.show(truncate=False)

Code explanation

Line 4: A spark session with the app’s Educative Answers is created.
Lines 6–10: We define data for the DataFrame.
Line 12: We define the columns of the DataFrame.
Line 13: We create a DataFrame using the createDataframe() method.
Line 15: We add a new column to the data frame using the withColumn() method passing the new column name curr_timestamp and the value to assign to the column the timestamp value returned by the method current_timestamp().
Line 17: We print the DataFrame.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources