0 votes
by

There are four data units in HIVE.

  • Databases - It is a namespace to avoid naming conflicts for tables, views, partitions, columns, etc. It supports multiple databases.

  • Tables - They are schemas in the namespace. Tables can be either internal, where Hive manages the lifecycle of data, or they can be external when files are used outside of Hive.

    • Partition

      Tables can be partitioned to have more managed data.

    • Buckets (cluster)

      Partitions can be divided more into buckets, to support data sampling.

...