0 votes
in Big Data | Hadoop by

What are side data distribution techniques in Hadoop?

1 Answer

0 votes
by

The extra read only data required by a hadoop job to process the main dataset is referred to as side data. Hadoop has two side data distribution techniques -

i) Using the job configuration - This technique should not be used for transferring more than few kilobytes of data as it can pressurize the memory usage of hadoop daemons,particularly if your system is running several hadoop jobs.

ii) Distributed Cache - Rather than serializing side data using the job configuration,  it is suggested to distribute data using hadoop's distributed cache mechanism.

Related questions

0 votes
asked Mar 27, 2020 in Big Data | Hadoop by AdilsonLima
0 votes
asked Feb 5, 2020 in Big Data | Hadoop by SakshiSharma
...