Big data analytics: Apache Spark vs. Hadoop

HDFS

HDFS handles the distribution, storage, and access of data over multiple separate servers. HDFS also contributes to high-throughput data access and high fault tolerance.

MapReduce

Mapreduce breaks large data processing tasks into smaller tasks which are then distributed over separate nodes and later on each task is executed.

YARN

YARN (Yet Another Resource Negotiator) is a cluster resource manager that schedules jobs and runs the specified tasks. Computing sources such as CPU and memory are also allocated by YARN.

Hadoop common

Hadoop common is a collection of utilities and libraries that are used by other parts of Hadoop.

Spark vs Hadoop

Performance

Spark is faster as it works with random access memory (RAM) rather than reading and writing intermediate data to disks.
Hadoop stores data on various sources and handles it in groups through MapReduce.

Security

Hadoop is better security-wise as it uses more than one authentication and access control procedure.
Spark improves security with authentication by shared secret or event logging.