What is the left anti join in PySpark?

Syntax

DataFrame.join(<right_Dataframe>, on=None, how="leftanti")

DataFrame.join(<right_Dataframe>, on=None, how="left_anti")

Parameters

The Dataframe above represents the left side (or left DataFrame) of the join operation.

<right_Dataframe> - It represents the right side (or right DataFrame) of the join operation.
on - The column or a list of column names.
how - This indicates the type of the join operation. As

Example

import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('edpresso').getOrCreate()
data = [("Sam", "USA", "cs150", 23),
        ("Jolie", "UK", "mech421", 19),
        ("Gaby", "Canada", "botany456", 26),
        ("Celeste", "Australia", "cs150", 22)]
columns = ["student_name","country","course_id","age"]
df_1 = spark.createDataFrame(data = data, schema = columns)
data = [("Computer Science", "cs150"),
        ("Mechanical Engineering", "mech421")
]
columns = ["course_name","course_id"]
df_2 = spark.createDataFrame(data = data, schema = columns)
df_left_anti = df_1.join(df_2, on="course_id", how="leftanti")
df_left_anti.show(truncate=False)

Explanation

Lines 1–2: Import the pyspark and SparkSession.
Line 4: We create a SparkSession with the application name edpresso.
Lines 6–9: We define the dummy data for the first DataFrame.
Line 10: We define the columns for the first DataFrame.
Line 11: We create the first spark DataFrame df_1 with the dummy data in lines 6–9 and the columns in line 11.
Lines 13–17: The second DataFrame df_2 is created.
Line 19: We apply the left anti join between the df_1 and df_2 datasets.
Line 21: We simply display the output.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

License: Creative Commons-Attribution NonCommercial-ShareAlike 4.0 (CC-BY-NC-SA 4.0)