Map/Reduce P2P Framework

P2P Distributed File System with an API for communicating with a Map/Reduce Framework.

 

 

Project Members

The team members are:

Jérémy Gotteland (Team Leader), Nikita Grishin, Valerian Pittet, David Froelicher, Alban Marguet, Sven Reber and Pascal Cudré.

 

Abstract

Datacenters require huge investments, first to build it, and then powering and cooling the computers require a lot of energy, which ends up having significant costs and being bad for the environment.

But with the rise of the Big Data hype, demand is growing quickly and big companies like IBM are investing massively to build the most performant and scalable data-centers.

Nowadays, almost everyone owns a personal computer, and every company provides a computer on their employee's desktop. The computing power and storage of those machines is though never fully used, which is a regrettable waste.

Project Goal

We intend to design a P2P protocol that we will implement in the form of a desktop application (possible mobile in the future). This protocol will allow to perform Map/Reduce operations over that network of computers connected together in the P2P network.
 

One possible outcome of this project would be to convince entreprises to deploy our application on their pool of computers, so that the idle computing power and the storage of those computers can be used to perform our computations.

 

Methods

We intend to implement our prototype using a framework like JXTA which has a Java implementation, suitable for developping our application in Scala for example.
JXTA does not seem to be an active open-source project anymore though, so we might have to look for something more performing developped since.

 

On top of that P2P framework we will implement: 


Perhaps we will use an existing framework to perform the Map and Reduce phases on the machine, as this is not the main challenge we fixed ourselves for this project.

 

Resources needed

As many machines as possible to deploy our application on and test the performance of our project for a large network of computers.

People from class may be asked to run the software on their machine, otherwise machines from EPFL.

Another approach would be to run a simulation to evaluate the performance of the system with thousands of nodes.

Risks to the success

1. The JXTA framework seems to be adequate to create our DFS (speaking of the network communication) but it is really hard to understand and there is not much documentation about it.

2.The objectives that we have fixed may be a little optimistic for the time that we have to finish the project.

 

Milestones

The first part of the project is going to be about building a robust and scalable distributed file system : RAIDFS for Randomized Aggregation of Interconnected Distributed File System

The second part of the project is going to use that basis to build the Map/Reduce over P2P protocol aimed for.

Color code:  Past deadlines | Next deadline | Future deadlines

 

  1. March 25:
    - Protocol Specification v1.0 0.5 & P2P node state diagram.
    - Work Breakdown for implementation. 
    - Implementation of command line interpreter. 
     
  2. April 3rd
    - Protocol Specification v0.8 
    - Full messages design & Implementation of Network Structuration ( <=> empty distributed file system but nodes ready to communicated with each others ) 

    - Hopefully full messages design for updating distributed file system index when a file is added/removed from it. 

    -Partial messages design for file chunking, spreading and replicating over the network.
     
  3. April 29 (right before after Easter Break) : 
    -Complete P2P Distributed File System with command line interpreter.

    -Implementation of Core Node features (State Machine for Node, behaviour on receiving messages) without Networking 

    -Implementation of CustomAdvertisements to broadcast Index and GlobalChunkfield for files but still having troubles receiving them. 

  4. May 6th-13th:
    - Integration of Shell and Core Node Features with JXTA Networking to have a working Distributed File System. 
     
  5. May 18th:
    -Add API for communicating with Map/Reduce Framework

    -Final debugging.

 

 

Workpackages

Deadline 1 : March 25th

Protocol Specification 1.0 0.5 (P2P Distributed File System) : 

Node State Diagram : 

Straightforward from Protocol Specification. 
 

Implementation of command line interpreter : 

David & Alban

 Investigating potential of JXSE library and coding features if necessary : 

Jérémy

Work Breakdown for implementation step: 

All team.

 

Deadline 2 : April 3rd

Protocol Specification v0.8

Deadline 3 : April 29th

Separate implementations of features.

Deadline 4 : May 6th - May 13th

Integration of different parts to have a functioning DFS

Creation of the Map-Reduce API Pascal 

 

Deadline 5 : May 19th

FINAL REPORT : https://wiki.epfl.ch/private/file/list.go?id=3408&key=77lplsxmcxrz0bfcjl6hh8q4gak6jd8f