Why do we need Hadoop?

In this Blog we are going to talk about Hadoop. Question which comes in our mind are :

Why do we need Hadoop?
Do we need this when we already have tools for data processing and Query languages SQL, Integtation platform,BI Tools like Informatica?

Motivation:
Compnies like Facebook , Google, Yahoo are datacentric company. According to the statistics, 1800 Hexabytes data get stored everyday universally.And out of these 20% only are stuctured data where as 80% of them are non structured in other word they are unorganized.

There is no software to process unstructured data. Every day the data is increasing and we require an inexpensive and relaible storage mechanism to store bulk amount of data and to process this huge amount of data.
The Ideal solution for the above requirement is to use Hadoop.

Use Cases:
The requirements are nt only for Big Companies but this will be required by the Small and Medium sized Compnies as well.For example consider some use cases like we want to

Analyse daly logs generated for over a period of a month or year
process various image data uploaded by diffenet user
Process unstructured data
Image processing.
Search opeartion on text, video, log data
Analyse all the log in and log out operation happened over a year

Hadoop is

An open source MapReduce implementation
Works on cluster
Have fault tolerancecapacity:Even if one out of 1000 nodes fails, It will make a note of this and will give to abvailable nodes.
Very much scalable:You can add any number of nodes at any point of time
Very flexible to adpat any changes

Please note that Hadoop undesrtsnds only two things:
-HDFS
-Map Reduce

Will discuss nore in details about how does It work the next blog Part II on Hadoop .

System Design | Implementation Rule Book | CheatSheet

Wednesday, October 24, 2012

Why do we need Hadoop?

No comments:

Post a Comment