Is it possible to have hadoop job output in multiple directories? If yes, how?
Yes, it is possible by using following approaches:
a. Using MultipleOutputs class-
This class simplifies writing output data to multiple outputs.
The API provides two overloaded write methods to achieve this
[php]MultipleOutput.write(‘OutputFileName”, new Text (key), new Text(value));[/php]
Then, we need to use overloaded write method, with an extra parameter for the base output path. This will allow to write the output file to separate output directories.
[php]MultipleOutput.write(‘OutputFileName”, new Text (key), new Text(value), baseOutputPath);[/php]
Then, we need to change your baseOutputpath in each of our implementation.
b. Rename/Move the file in driver class-
This is the easiest hack to write output to multiple directories. So, we can use MultipleOutputs and write all the output files to a single directory. But the file names need to be different for each category.
We have categorized the above Hadoop Scenario based Interview Questions for Hadoop developers, Hadoop adminstrators, Hadoop architect etc for freshers as well as for experienced candidates.