Hadoop 1.0 Architecture :
Limitation of Hadoop 1.0 :
Limitation of Hadoop 1.0 :
- No horizontal scalability of NameNode
- Metadata is stored in NameNode Memory(RAM)
- Bottleneck after ~4000 nodes
- Result on cascading failure
- Does not support NameNode High Availability
- Not a hot standby for the NameNode
- Connects to NameNode regularity
- Housekeeping backup of NameNode metadata
- Saved metadata can build a failed NameNode
- Overburden JobTracker
- CPU : Spends a very significant portion of time and effort managing the life cycle of application
- Network : Single listener Thread of communicate wit thousand of Map and Reduce Jobs
- Not possible to run Non-MapReduce Big Data Application on HDFS
- Only MapReduce processing can be achieved
- Alternate Data Storage is needed for other processing such as Real-time or Graph Analysis
- Does not support Multi-tenancy
Hadoop 2.0 Architecture :
- HDFS Federation
- Multiple NameNode and Namespaces
- Support for NameNode High Availability
- YARN – Yet another resource negotiator
- Better processing control
- Support for non Map Reduce type of processing
- Support for multi-tenancy
- Resource Manager ,Node Manager, App Master, Capacity Scheduler
- Multi Tenancy :
- Different types of jobs are organized in different queues (Batch, Streaming, Interactive)
- Queue shares as %’s of cluster
- Each queue has an associated priority
- FIFO scheduling which each queue
- Security ensured between application
- HDFS Snapshots
- NFSv3 access to data in HDFS
- Support for running Hadoop on MS Windows
- Binary Compatibility for MapReduce applications built on Hadoop 1.0
- Substantial amount of integration testing wih rest of the projects (such as PIG, HIVE) in Hadoop ecosystem.
Hadoop Application :
No comments:
Post a Comment