0 votes
in Hadoop by
What is Disk Balancer in Hadoop?

1 Answer

0 votes
by

What is Disk Balancer in Hadoop?

Hadoop Interview Questions for Experienced - Disk Balancer

Hadoop Interview Questions for Experienced – Disk Balancer

HDFS provides a command line tool called Diskbalancer. It distributes data evenly on all disks of a datanode. This tool operates against a given datanode and moves blocks from one disk to another.

Disk balancer works by creating a plan (set of statements) and executing that plan on the datanode. Thus, the plan describes how much data should move between two disks. A plan composes multiple steps. Move step has source disk, destination disk and the number of bytes to move. And the plan will execute against an operational datanode.

By default, disk balancer is not enabled; Hence, to enable disk balancer dfs.disk.balancer.enabled must be set true in hdfs-site.xml.

When we write new block in hdfs, then, datanode uses volume choosing the policy to choose the disk for the block. Each directory is the volume in hdfs terminology. Thus, two such policies are:

Round-robin: It distributes the new blocks evenly across the available disks.

Available space: It writes data to the disk that has maximum free space (by percentage).

Related questions

0 votes
asked Jun 26, 2023 in Hadoop by Robindeniel
0 votes
asked Jun 18, 2023 in Hadoop by Robindeniel
...