The left anti join in PySpark is similar to the join functionality, but it returns only columns from the left DataFrame for non-matched records.
DataFrame.join(<right_Dataframe>, on=None, how="leftanti")
OR
DataFrame.join(<right_Dataframe>, on=None, how="left_anti")
The Dataframe above represents the left side (or left DataFrame) of the join operation.
<right_Dataframe> - It represents the right side (or right DataFrame) of the join operation.on - The column or a list of column names.how - This indicates the type of the join operation. Asimport pysparkfrom pyspark.sql import SparkSessionspark = SparkSession.builder.appName('edpresso').getOrCreate()data = [("Sam", "USA", "cs150", 23),("Jolie", "UK", "mech421", 19),("Gaby", "Canada", "botany456", 26),("Celeste", "Australia", "cs150", 22)]columns = ["student_name","country","course_id","age"]df_1 = spark.createDataFrame(data = data, schema = columns)data = [("Computer Science", "cs150"),("Mechanical Engineering", "mech421")]columns = ["course_name","course_id"]df_2 = spark.createDataFrame(data = data, schema = columns)df_left_anti = df_1.join(df_2, on="course_id", how="leftanti")df_left_anti.show(truncate=False)
pyspark and SparkSession.edpresso.df_1 with the dummy data in lines 6–9 and the columns in line 11.df_2 is created.df_1 and df_2 datasets.