in Hadoop by
Q:
Is it possible to have hadoop job output in multiple directories? If yes, how?

1 Answer

0 votes
by

Is it possible to have hadoop job output in multiple directories? If yes, how?

Yes, it is possible by using following approaches:

a. Using MultipleOutputs class-

This class simplifies writing output data to multiple outputs.

[php]MultipleOutputs.addNamedOutput(job,”OutputFileName”,OutputFormatClass,keyClass,valueClass);[/php]

The API provides two overloaded write methods to achieve this

[php]MultipleOutput.write(‘OutputFileName”, new Text (key), new Text(value));[/php]

Then, we need to use overloaded write method, with an extra parameter for the base output path. This will allow to write the output file to separate output directories.

[php]MultipleOutput.write(‘OutputFileName”, new Text (key), new Text(value), baseOutputPath);[/php]

Then, we need to change your baseOutputpath in each of our implementation.

b. Rename/Move the file in driver class-

This is the easiest hack to write output to multiple directories. So, we can use MultipleOutputs and write all the output files to a single directory. But the file names need to be different for each category.

We have categorized the above Hadoop Scenario based Interview Questions for Hadoop developers, Hadoop adminstrators, Hadoop architect etc for freshers as well as for experienced candidates.

Click here to read more about Loan/Mortgage
Click here to read more about Insurance

Related questions

0 votes
asked Jun 7, 2020 in Hadoop by Robindeniel
0 votes
asked Jun 7, 2020 in Hive by SakshiSharma
...