Machine Learning Notepad: Hadoop Architecture

Tuesday, January 26, 2016

Hadoop Architecture

Hadoop 1.0 Architecture :

Hadoop 1.0 Architecture

Limitation of Hadoop 1.0 :

Limitation of Hadoop 1.0 :

No horizontal scalability of NameNode

Metadata is stored in NameNode Memory(RAM)
Bottleneck after ~4000 nodes
Result on cascading failure

Does not support NameNode High Availability

Not a hot standby for the NameNode
Connects to NameNode regularity
Housekeeping backup of NameNode metadata
Saved metadata can build a failed NameNode

Overburden JobTracker

CPU : Spends a very significant portion of time and effort managing the life cycle of application
Network : Single listener Thread of communicate wit thousand of Map and Reduce Jobs

Not possible to run Non-MapReduce Big Data Application on HDFS

Only MapReduce processing can be achieved
Alternate Data Storage is needed for other processing such as Real-time or Graph Analysis

Does not support Multi-tenancy

Hadoop 2.0 Architecture :

Hadoop Architecture

Hadoop 2.0 Feature:

HDFS Federation

Multiple NameNode and Namespaces

Support for NameNode High Availability
YARN – Yet another resource negotiator

Better processing control
Support for non Map Reduce type of processing
Support for multi-tenancy
Resource Manager ,Node Manager, App Master, Capacity Scheduler

Multi Tenancy :

Different types of jobs are organized in different queues (Batch, Streaming, Interactive)
Queue shares as %’s of cluster
Each queue has an associated priority
FIFO scheduling which each queue
Security ensured between application

HDFS Snapshots
NFSv3 access to data in HDFS
Support for running Hadoop on MS Windows
Binary Compatibility for MapReduce applications built on Hadoop 1.0
Substantial amount of integration testing wih rest of the projects (such as PIG, HIVE) in Hadoop ecosystem.

Hadoop Application :

Hadoop Application

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)