Ans:
Hadoop is an open-source software framework for distributed storage and processing of large datasets. Apache Hadoop core components are HDFS, MapReduce, and YARN.
- HDFS- Hadoop Distributed File System (HDFS) is the primary storage system of Hadoop. HDFS store very large files running on a cluster of commodity hardware. It works on the principle of storage of less number of large files rather than the huge number of small files. HDFS stores data reliably even in the case of hardware failure. It provides high throughput access to an application by accessing in parallel.
- MapReduce- MapReduce is the data processing layer of Hadoop. It writes an application that processes large structured and unstructured data stored in HDFS. MapReduce processes a huge amount of data in parallel. It does this by dividing the job (submitted job) into a set of independent tasks (sub-job). In Hadoop, MapReduce works by breaking the processing into phases: Map and Reduce. The Map is the first phase of processing, where we specify all the complex logic code. Reduce is the second phase of processing. Here we specify light-weight processing like aggregation/summation.
- YARN- YARN is the processing framework in Hadoop. It provides Resource management and allows multiple data processing engines. For example real-time streaming, data science, and batch processing.