Map-Reduce API

DRAFT

Map reduce flow on the p2p network :

An Edge (Peer P) receives a Job, consisting of Job1(JobId, ResourcesNeeded, MapFunction, ReduceFunction)
Peer P adds a MapFile M1(JobId, ResourcesNeeded, MapFunction, InitiatorPeer) to the index, with all chunks empty. The content of each chunk of the MapFile can be obtained by applying the Map function of Job1 on the Resources. Each chunk corresponds to the mapping of a chunk of the ressources. Then, these chunks will be further splitted into several parts for each key the mapper discovers (KeyChunks)
Peer P advertises that the index has been modified with a new MapFile M1.
Neighbors receive the update. They check if they can already get chunks of the MapFile from their neighbors. Otherwise, they check which Resource they have, and choose some randomly to create the chunks (Map).
Each time a Mapper finishes mapping a chunk, it sends a ChunkMapped(JobId, ChunkId, List(keys)) msg, with all the keys discovered in the chunk, to the Initiator. The initiator keeps track of the keys, of the mappers and of the finished mappings to facilitate the work of the reducers.
For each new key the initiator receives, it creates a ReduceFile R1(JobId, Key, ResourcesNeeded, ReduceFunction, List(Mapper Peers), InitiatorPeer) and signals an index update. The resourcesNeeded are all the KeyChunks corresponding to the Key. So a reducer will need to get the content of the right KeyChunks of every Chunks created during the Map phase before starting.
Neighbors that ask for a ReduceFile of a particular key become responsible for the reduction of this key. They have to get the ReduceFile chunks by either grabbing the KeyChunks and Reducing them, or copying them from another Source like another file(preferred way).
When a reducer has all the KeyChunks of its Key, it applies the Reduce function on them to fill the ReduceFile. When the reducing is completed, the reducer sends a Message ChunkReduced(JobId, Key) to the initiator so that it can keep track of the progress.
The operation is known to be finished when all ReduceFiles are stabilized, or when the initiator has received ChunkReduced msg for all keys in its list, from alive peers.

Sources :

http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html

http://research.google.com/archive/mapreduce.html

This wiki
- Home
- Sitemap
- Files
- New page
- Administration
This page
- Edit
- Clean
- Delete
- History
- Print
- Comments (0)
Share

Prospective students portal

Students portal

Researchers portal

Staff portal

Business portal

Mediacorner

Teaching portal

EPFL Alumni Portal

Architecture, Civil and Environmental Engineering ENAC

Basic Sciences SB

Engineering STI

Computer and Communication Sciences IC

Life Sciences SV

Management of Technology CDM

College of Humanities CDH

EPFL

Education

Research

Innovation & Tech Transfer

EPFL Campus

Map-Reduce API