What is the key- value pair in MapReduce?
Hadoop MapReduce implements a data model, which represents data as key-value pairs. Both input and output to MapReduce Framework should be in Key-value pairs only.
In Hadoop, if the schema is static we can directly work on the column instead of key-value. But, the schema is not static we will work on keys and values. Keys and values are not the intrinsic properties of the data. But the user analyzing the data chooses a key-value pair. A Key-value pair in Hadoop MapReduce generate in following way:
InputSplit- It is the logical representation of data. InputSplit represents the data which individual Mapper will process.
RecordReader- It communicates with the InputSplit (created by InputFormat). And converts the split into records. Records are in form of Key-value pairs that are suitable for reading by the mapper. By Default RecordReader uses TextInputFormat for converting data into a key-value pair.
Key- It is the byte offset of the beginning of the line within the file, so it will be unique if combined with the file.
Value- It is the contents of the line, excluding line terminators. For Example file content is- on the top of the crumpetty Tree
Key- 0
Value- on the top of the crumpetty Tree