in Big Data | Hadoop by

What is Mapper in Hadoop MapReduce?

1 Answer

0 votes
by

Mapper in Hadoop takes each record generated by the RecordReader as input. Then processes each record and generates key-value pairs. This key-value pair is completely different from the input pair. The mapper output is known as intermediate output which is stored on the local disk. Mapper does not store its output on HDFS, as it is temporary data and storing on HDFS will create multiple copies.

Before storing mapper output on the local disk, partitioning of output takes place on the basis of the key and then sorting is done. This partitioning specifies that all the value for each key is grouped together. Mapper in hadoop only understands key-value pairs of data. So data should be converted into key-value pair before passing to the mapper. Data is converted into key-value pairs by InputSplit and RecordReader.

...