0 votes
in HDFS by
What is the distributed cache in MapReduce?

1 Answer

0 votes
by

A distributed cache is a mechanism wherein the data coming from the disk can be cached and made available for all worker nodes. When a MapReduce program is running, instead of reading the data from the disk every time, it would pick up the data from the distributed cache to benefit the MapReduce processing. 

To copy the file to HDFS, you can use the command:

hdfs dfs-put /user/Simplilearn/lib/jar_file.jar

To set up the application’s JobConf, use the command:

DistributedCache.addFileToClasspath(newpath(“/user/Simplilearn/lib/jar_file.jar”), conf)

Then, add it to the driver class.

Related questions

+1 vote
asked Jun 28, 2021 in HDFS by Robindeniel
0 votes
asked Feb 16, 2020 in Redis by rahuljain1
...