What is the difference between a federation and high availability?
HDFS Federation
HDFS High Availability
There is no limitation to the number of NameNodes and the NameNodes are not related to each other
All the NameNodes share a pool of metadata in which each NameNode will have its dedicated pool
Provides fault tolerance, i.e., if one NameNode goes down, it will not affect the data of the other NameNode
There are two NameNodes that are related to each other. Both active and standby NameNodes work all the time
One at a time, active NameNodes will be up and running, while standby NameNodes will be idle and updating its metadata once in a while
It requires two separate machines. First, the active NameNode will be configured, while the secondary NameNode will be configured on the other system