+1 vote
in Hive by

What is a partition in Hive and why is partitioning required in Hive

1 Answer

0 votes
by

Partition is a process for grouping similar types of data together based on columns or partition keys. Each table can have one or more partition keys to identify a particular partition. 

Partitioning provides granularity in a Hive table. It reduces the query latency by scanning only relevant partitioned data instead of the entire data set. We can partition the transaction data for a bank based on month — January, February, etc. Any operation regarding a particular month, say February, will only have to scan the February partition, rather than the entire table data.

Related questions

0 votes
asked Jun 20, 2020 in Hive by Robindeniel
0 votes
asked Jun 7, 2020 in Hive by SakshiSharma
...