What is PySpark MapType?

A MapType interface is similar to dictionary in Python or HashMap in Java.

It’s used to store key-value pairs. The key and value have a data type associated with them. The keys in a MapType are not allowed to be None or NULL.

Syntax

MapType(keyType, valueType, valueContainsNull=True)

Parameters

  • keyType: This is the data type of the keys.
  • valueType: This is the data type of the values.
  • valueContainsNull: This is a boolean value indicating whether the values can be NULL or None. The default value is True, which indicates that the values can be NULL.

Code example

from pyspark.sql import SparkSession
from pyspark.sql.types import StructField, StructType, StringType, MapType
spark = SparkSession.builder.appName('answers').getOrCreate()
dfSchema = StructType([
StructField('Emp Name', StringType(), True),
StructField('Details', MapType(StringType(),StringType()),True)
])
data = [
('John Wick',{'country':'usa','profession':'Don'}),
('Yash',{'country':'india','profession':'Artist'}),
('Novak Djokovic',{'country':'serbia','profession':'tennis player'}),
('Sundar Picchai',{'country':'usa','profession':'CEO'}),
('Kobe Bryant',{'country':'usa','profession':'Basket ball player'})
]
df = spark.createDataFrame(data=data, schema = dfSchema)
df.show(truncate=False)

Code explanation

  • Lines 1–2: The SparkSession and relevant data types are imported.
  • Line 4: A SparkSession with the application name answers is created.
  • Lines 6–9: The schema for the DataFrame to be created is defined. The Details column is a MapType.
  • Lines 11–17: The sample data for the DataFrame is defined. A Python dictionary is defined for the Details column.
  • Line 19: A DataFrame is created with the schema and the sample data is defined.
  • Line 20: The created DataFrame is displayed.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved