0 votes
in Hadoop by
What is distributed cache and what are its benefits?

1 Answer

0 votes
by

For the execution of a job many a times the application requires various files, jars and archives. Distributed Cache is a mechanism provided by the MapReduce framework that copies the required files to the slave node, much before the execution of the task starts.All the required files will be copied only once per job.

All the required files will be copied only once per job. The efficiency we gain from Distributed Cache is that the necessary files are copied before the execution of the task before a particular job starts on that node. Other than this, all the cluster machines can use this cache as local file system.However, under rare circumstances it would be better for the tasks to use the standard HDFS I/O to copy the files instead of depending on the Distributed Cache. For instance, if a particular application has very few reduces and requires huge artifacts of size greater than 512MB in the distributed cache, then it is a better to opt for HDFS I/O.

...