The current timestamp can be added as a new column to spark Dataframe using the current_timestamp()
function of the sql
module in pyspark.
The method returns the timestamp in the yyyy-mm-dd hh:mm:ss. nnn
format.
pyspark.sql.functions.current_timestamp()
This method has no parameters.
This method returns the current timestamp.
Let’s see the code below:
import pysparkfrom pyspark.sql import SparkSessionfrom pyspark.sql.functions import current_timestampspark = SparkSession.builder.appName('edpresso').getOrCreate()data = [("James","Smith","USA","CA"),("Michael","Rose","USA","NY"),("Robert","Williams","USA","CA"),("Maria","Jones","USA","FL")]columns = ["firstname","lastname","country","state"]df = spark.createDataFrame(data = data, schema = columns)df_with_ts = df.withColumn("curr_timestamp", current_timestamp())df_with_ts.show(truncate=False)
createDataframe()
method.withColumn()
method passing the new column name curr_timestamp
and the value to assign to the column the timestamp value returned by the method current_timestamp()
.Free Resources