Categories

Jan 8 in Big Data | Hadoop
Q: How to compress mapper output in Hadoop?

1 Answer

Jan 8
Mapper task processes each input record (from RecordReader) and generates a key-value pair and this key-value pairs generated by mapper is completely different from the input pair. The output of Mapper is also Known as intermediate output is written to the local disk.

To compress mapper output we should set conf.set(“mapreduce.map.output.compress”, true)

Apart from setting this property to enable compression for mapper output, we also need to consider some other factors like, which codec to use and what should be the compression type.

Following are the properties for configuring the same:-

mapred.map.output.compression.codec

mapred.output.compression.type

Out of these two factors, the choice of right codec is of more importance. As each codec has some pros and cons, you need to figure out, which suits your requirement. Generally, you would want faster read/write, a good compression factor and CPU friendly decompression (we have less number of reducers). Considering these factors, snappy codec feels like the best fit, as it has the faster read/write and a compression factor of 3.
Click here to read more about Loan/Mortgage
Click here to read more about Insurance

Related questions

Madanswer
Jan 8 in Big Data | Hadoop
Jan 11 in Big Data | Hadoop
Nov 24 in Hadoop
...