Map -> Combiner -> Partitioner -> Sort -> Shuffle -> Sort -> Reduce
Map phase is done by mappers. Mappers run on unsorted input key/values pairs. Each mapper emits zero, one or multiple output key/value pairs for each input key/value pairs.
Combine phase is done by Combiners. Combiner should combine key/value pairs with the same key together. Each combiner may run zero, once or multiple times.
Shuffle and Sort phase is done by framework. Data from all mappers are grouped by the key, split among reducers and sorted by the key. Each reducer obtains all values associated with the same key. Programmer may supply custom compare function for sorting and Partitioner for data split.
Partitioner decides which Reducer will get a particular key value pair.
Reducer obtains sorted key/[values list] pairs sorted by the key. Value list contains all values with the same key produced by mappers. Each reducer emits zero, one or multiple output key/value pairs for each input key/value pair.
Below diagram is copied from Hadoop book.
No comments:
Post a Comment