You can share details on how you deployed Hadoop distributions like Cloudera and Hortonworks in your organization either in a standalone environment or on the cloud. Mention how you configured the number of required nodes , tools, services, security features such as SSL, SASL, Kerberos, etc. Having set up the Hadoop cluster, talk about how you initially extracted the data from data sources like APIs, SQL based databases, etc and stored it in HDFS( storage layer) , how you performed data cleaning and validation, and the series of ETLs you performed to extract the data in the given format to extract KPIs.
Some of the ETLs tasks include :
- Date format parsing
- The casting of data type values
- Deriving calculated fields