in Big Data | Hadoop by
Q:
Data Compression in MapReducer

1 Answer

0 votes
by
Hadoop MapReduce provides facilities for the application-writer to specify compression for both intermediate map-outputs and the job-outputs i.e. output of the reduces. It also comes bundled with CompressionCodec implementation for the zlib compression algorithm. The gzip, bzip2, snappy, and lz4 file format are also supported.

Hadoop also provides native implementations of the above compression codecs for reasons of both performance (zlib) and non-availability of Java libraries. More details on their usage and availability are available here.

Intermediate Outputs

Applications can control compression of intermediate map-outputs via the Configuration.set(MRJobConfig.MAP_OUTPUT_COMPRESS, boolean) api and the CompressionCodec to be used via the Configuration.set(MRJobConfig.MAP_OUTPUT_COMPRESS_CODEC, Class) api.

Job Outputs

Applications can control compression of job-outputs via the FileOutputFormat.setCompressOutput(Job, boolean) api and the CompressionCodec to be used can be specified via the FileOutputFormat.setOutputCompressorClass(Job, Class) api.

If the job outputs are to be stored in the SequenceFileOutputFormat, the required SequenceFile.CompressionType (i.e. RECORD / BLOCK - defaults to RECORD) can be specified via the SequenceFileOutputFormat.setOutputCompressionType(Job, SequenceFile.CompressionType) api

Related questions

0 votes
0 votes
0 votes
asked Jan 8, 2020 in Big Data | Hadoop by GeorgeBell
0 votes
0 votes
asked Jan 8, 2020 in Big Data | Hadoop by GeorgeBell
...