Spark applications run as independent sets of processes on a cluster. These clusters are coordinated by the SparkContext object in the driver (main) program.
To run on a cluster:
This approach has many advantages:
The spark Cluster manager currently supports the following cluster managers.
Standalone: A simple cluster manager included within Spark can access HDFS and is easier to set up as it has a lot of online support. The cluster manager is resilient in nature and can successfully handle failures. It has the capability to manage resources according to the requirements of the applications.
Apache Mesos: A general manager that can also run Hadoop MapReduce and service applications. It is a distributed cluster manager that can manage resources per application. We can easily run spark jobs, Hadoop MapReduce, or any other service applications. Apache has API for most programming languages.
Hadoop YARN: A general manager in Hadoop. It acts as a distributed computing framework that maintains job scheduling as well as resource management. There are executors
and pluggable schedulers
that are readily available.
Kubernetes: A system for the automation, deployment, scaling, and management of containerized applications. It makes use of a native Kubernetes scheduler that has been added to Spark; however, the Kubernetes scheduler is currently experimental.
Free Resources