Spark allows you to configure a system according to your needs. One of the locations to configure a system is Spark Properties. Spark Properties control most application parameters and can be set using a SparkConf
object. One sub-domain of these properties is Memory Management.
Since version 1.6, Spark has been using the Unified Memory Manager. The Unified Memory Manager allows the Storage Memory and Execution Memory to co-exist and share each other’s free space. This memory management model is based on JVM and has two types:
On-Heap Memory has four components, as illustrated on the right:
Storage Memory stores Spark cache data, broadcast variable, and Unroll data.
Execution Memory stores temporary objects during the execution of Spark tasks such as sort, aggregate, etc.
User Memory stores your data that is needed for RDD conversion operations(e.g., the information for RDD dependency).
Reserved Memory is reserved for the system and is used to store Spark’s internal objects. Its size is hardcoded.
Off-Heap Memory has two components, as illustrated on the right:
They are used for the same purpose described above. Off-heap memory is disabled by default, but we can enable it with the spark.memory.offHeap.enabled
parameter and set the memory size with the spark.memory.offHeap.size
parameter.
Here is a list of different properties that can be used to configure Spark.
Free Resources