• Home
  • Recent Q&A
  • Java
  • Cloud
  • JavaScript
  • Python
  • SQL
  • PHP
  • HTML
  • C++
  • Data Science
  • DBMS
  • Devops
  • Hadoop
  • Machine Learning
in Big Data | Hadoop by
Q:
What are the steps involved in deploying a big data solution?

1 Answer

0 votes
by

 

i) Data Ingestion – The foremost step in deploying big data solutions is to extract data from different sources which could be an Enterprise Resource Planning System like SAP, any CRM like Salesforce or Siebel , RDBMS like MySQL or Oracle, or could be the log files, flat files, documents, images, social media feeds. This data needs to be stored in HDFS. Data can either be ingested through batch jobs that run every 15 minutes, once every night and so on or through streaming in real-time from 100 ms to 120 seconds.

 

ii) Data Storage – The subsequent step after ingesting data is to store it either in HDFS or NoSQL database like HBase.  HBase storage works well for random read/write access whereas HDFS is optimized for sequential access.

 

iii) Data Processing – The ultimate step is to process the data using one of the processing frameworks like mapreduce, spark, pig, hive, etc.

 

Related questions

+1 vote
asked Jun 11, 2020 in SAS by JackTerrance
0 votes
asked Dec 28, 2019 in TOGAF by sharadyadav1986
...