An online retail company stores a large number of customer data (terabytes to petabytes) into Amazon S3.

Question

An online retail company stores a large number of customer data (terabytes to petabytes) into Amazon S3.

asked Sep 1, 2022 in AWS by sharadyadav1986

An online retail company stores a large number of customer data (terabytes to petabytes) into Amazon S3.The company wants to drive some business insight out of this data. They plan to securely run SQL-based complex analytical queries on the S3 data directly and process it to generate business insights and build a data visualization dashboard for the business and management review and decision-making.

You are hired as a Solutions Architect to provide a cost-effective and quick solution to this. Which of the following AWS services would you recommend?

A. Use Amazon Redshift Spectrum to run SQL-based queries on the data stored in Amazon S3 and then process it to Amazon Kinesis Data Analytics for creating a dashboard

B. Use Amazon Redshift to run SQL-based queries on the data stored in Amazon S3 and then process it on a custom web-based dashboard for data visualization

C. Use Amazon EMR to run SQL-based queries on the data stored in Amazon S3 and then process it to Amazon Quicksight for data visualization

D. Use Amazon Athena to run SQL-based queries on the data stored in Amazon S3 and then process it to Amazon Quicksight for dashboard view

1 Answer

sharadyadav1986 · Answer 1 · 2022-09-01T02:30:05+0000

D. Use Amazon Athena to run SQL-based queries on the data stored in Amazon S3 and then process it to Amazon Quicksight for dashboard view

Explanation

Option A is incorrect because Amazon Kinesis Data Analytics cannot be used to generate business insights as mentioned in the requirement. It neither can be used for data visualization.

One must depend on some BI tool after processing data from Amazon Kinesis Data Analytics. It is not a cost-optimized solution.

Option B is incorrect primarily due to the cost factors. Using Amazon Redshift for querying S3 data requires the transfer and loading of the data to Redshift instances. It also takes time and additional cost to create a custom web-based dashboard or data visualization tool.

Option C is incorrect because Amazon EMR is a cloud big data platform for running large-scale distributed data processing jobs, interactive SQL queries, and machine learning (ML) applications using open-source analytics frameworks such as Apache Spark, Apache Hive, and Presto. It is mainly used to perform big data analytics, process real-time data streams, accelerate data science and ML adoption. The requirement here is not to build any of such solutions on a Big Data platform. Hence this option is not suitable. It is neither quick nor cost-effective compared to option D.

Option D is CORRECT because Amazon Athena is the most cost-effective solution to run SQL-based analytical queries on S3 data and then publish it to Amazon QuickSight for dashboard view.

An online retail company stores a large number of customer data (terabytes to petabytes) into Amazon S3.

Please log in or register to answer this question.

1 Answer