1 Answer

0 votes
by
HBase Read

A read against HBase must be reconciled between the HFiles, MemStore & BLOCKCACHE.The BlockCache is designed to keep frequently accessed data from the HFiles in memory so as to avoid disk reads.Each column family has its own BlockCache.BlockCache contains data in form of 'block', as unit of data that HBase reads from disk in a single pass.The HFile is physically laid out as a sequence of blocks plus an index over those blocks. This means reading a block from HBase requires only looking up that block's location in the index and retrieving it from disk.

Block: It is the smallest indexed unit of data and is the smallest unit of data that can be read from disk. default size 64KB.

Scenario, when smaller block size is preferred: To perform random lookups. Having smaller blocks creates a larger index and thereby consumes more memory.

Scenario, when larger block size is preferred: To perform sequential scans frequently. This allows you to save on memory because larger blocks mean fewer index entries and thus a smaller index.

Reading a row from HBase requires first checking the MemStore, then the BlockCache, Finally, HFiles on disk are accessed.

Related questions

0 votes
asked Sep 7, 2019 in Big Data | Hadoop by john ganales
0 votes
asked Oct 12, 2019 in Big Data | Hadoop by GeorgeBell
...