A distributed cache is a mechanism wherein the data coming from the disk can be cached and made available for all worker nodes. When a MapReduce program is running, instead of reading the data from the disk every time, it would pick up the data from the distributed cache to benefit the MapReduce processing.
To copy the file to HDFS, you can use the command:
hdfs dfs-put /user/Simplilearn/lib/jar_file.jar
To set up the application’s JobConf, use the command:
DistributedCache.addFileToClasspath(newpath(“/user/Simplilearn/lib/jar_file.jar”), conf)
Then, add it to the driver class.