Why do we need Hadoop?
The picture of Hadoop came into existence to deal with Big Data challenges. The challenges with Big Data are-
Storage – Since data is very large, so storing such a huge amount of data is very difficult.
Security – Since the data is huge in size, keeping it secure is another challenge.
Analytics – In Big Data, most of the time we are unaware of the kind of data we are dealing with. So analyzing that data is even more difficult.
Data Quality – In the case of Big Data, data is very messy, inconsistent and incomplete.
Discovery – Using a powerful algorithm to find patterns and insights are very difficult.
Hadoop is an open-source software framework that supports the storage and processing of large data sets. Apache Hadoop is the best solution for storing and processing Big data because:
Apache Hadoop stores huge files as they are (raw) without specifying any schema.
High scalability – We can add any number of nodes, hence enhancing performance dramatically.
Reliable – It stores data reliably on the cluster despite machine failure.
High availability – In Hadoop data is highly available despite hardware failure. If a machine or hardware crashes, then we can access data from another path.
Economic – Hadoop runs on a cluster of commodity hardware which is not very expensive