How data or file is read in HDFS?
To read from HDFS, the first client communicates to namenode for metadata. A client comes out of namenode with the name of files and its location. The Namenode responds with details of the number of blocks, replication factor. Now client communicates with Datanode where the blocks are present. Clients start reading data parallel from the Datanode. It read on the basis of information received from the namenodes.
Once client or application receives all the blocks of the file, it will combine these blocks to form a file. For read performance improvement, the location of each block ordered by their distance from the client. HDFS selects the replica which is closest to the client. This reduces the read latency and bandwidth consumption. It first read the block in the same node. Then another node in the same rack, and then finally another Datanode in another rack.
Read HDFS file read operation workflow in detail