How to select columns based on regex in PySpark

Note: If you see any warnings in the output, please ignore them.

Lines 1–2: We import pyspark and SparkSession.
Line 4: We create SparkSession with the application name edpresso.
Lines 6–10: We define the dummy data for the DataFrame.
Line 12: We define the columns for the dummy data.
Line 13: We create a spark DataFrame with the dummy data in lines 6–10 and the columns in line 13.
Line 14: We select the subset of the columns by using a regex. The regex matches the column names ending with the keyword name. The colRegex() method retrieves all columns ending with a name.

Free AI Mock Interviews

Free Resources

Overview