0 votes
in Hadoop by
Is it possible to provide multiple input to Hadoop? If yes then how?

1 Answer

0 votes
by

Is it possible to provide multiple input to Hadoop? If yes then how?

Yes, it is possible by using MultipleInputs class.

For example:

If we had weather data from the UK Met Office. And we want to combine with the NCDC data for our maximum temperature analysis. Then, we can set up the input as follows:

[php]

MultipleInputs.addInputPath(job,ncdcInputPath,TextInputFormat.class,MaxTemperatureMapper.class);

MultipleInputs.addInputPath(job,metofficeInputPath,TextInputFormat.class, MetofficeMaxTemperatureMapper.class);

[/php]

The above code replaces the usual calls to FileInputFormat.addInputPath() and job.setmapperClass(). Both the Met Office and NCDC data are text based. So, we use TextInputFormat for each. And, we will use two different mappers, as the two data sources have different line format. The MaxTemperatureMapperr reads NCDC input data and extracts the year and temperature fields. The MetofficeMaxTemperatureMappers reads Met Office input data. Then, extracts the year and temperature fields.

Read More on InputSplit in Hadoop MapReduce

Related questions

0 votes
asked Apr 28, 2023 in Testing by Robindeniel
+1 vote
asked Oct 29, 2022 in Hadoop by SakshiSharma
...