What is the distributed cache in MapReduce?

Question

What is the distributed cache in MapReduce?

1 Answer

sharadyadav1986 · Answer 1 · 2020-11-24T06:26:44+0000

A distributed cache is a mechanism wherein the data coming from the disk can be cached and made available for all worker nodes. When a MapReduce program is running, instead of reading the data from the disk every time, it would pick up the data from the distributed cache to benefit the MapReduce processing.

To copy the file to HDFS, you can use the command:

hdfs dfs-put /user/Simplilearn/lib/jar_file.jar

To set up the application’s JobConf, use the command:

DistributedCache.addFileToClasspath(newpath(“/user/Simplilearn/lib/jar_file.jar”), conf)

Then, add it to the driver class.

What is the distributed cache in MapReduce?

Please log in or register to answer this question.

1 Answer