How to submit extra files(jars, static files) for MapReduce job during runtime?
MapReduce framework provides Distributed Cache to caches files needed by the applications. It can cache read-only text files, archives, jar files etc.
First of all, an application which needs to use distributed cache to distribute a file should make sure that the files are available on URLs. Hence, URLs can be either hdfs:// or http://. Now, if the file is present on the hdfs:// or http://urls. Then, user mentions it to be cache file to distribute. This framework will copy the cache file on all the nodes before starting of tasks on those nodes. The files are only copied once per job. Applications should not modify those files.
By default size of the distributed cache is 10 GB. We can adjust the size of distributed cache using local.cache.size.
To seperate the above Interview Questions for freshers and experienced candidates, we have categorized the above Mapreduce Hadoop Interview Questions for freshers and experienced in the following manner.