Apache Drill’s architecture consists of four main components: Drillbit, ZooKeeper, Client APIs, and Storage Plugins.
1. Drillbit: A daemon running on each node in the cluster, responsible for query execution and coordination. It processes queries by breaking them into smaller fragments and distributing them across nodes.
2. ZooKeeper: Manages cluster membership, elects a leader Drillbit, and maintains metadata about active Drillbits. It ensures fault tolerance and high availability.
3. Client APIs: Provide interfaces for submitting queries (e.g., JDBC, ODBC). Clients connect to any Drillbit, which becomes the Foreman for that query.
4. Storage Plugins: Enable access to various data sources (e.g., HDFS, S3, RDBMS) and formats (e.g., Parquet, JSON, CSV).
Drillbits interact with each other during query execution, exchanging data and coordinating tasks. The Foreman Drillbit receives a client’s query, generates a logical plan, and creates physical plans for each fragment. These are distributed among participating Drillbits, which execute their respective fragments and return results to the Foreman. Finally, the Foreman aggregates and returns the final result to the client.
ZooKeeper facilitates communication between Drillbits, ensuring they’re aware of each other’s status and location. Storage plugins allow Drillbits to read from and write to different data sources, abstracting away storage-specific details.