Map/Reduce P2P - Home - OLD

Project Members

The team members are:

Jérémy Gotteland (Team Leader), Nikita Grishin, Valerian Pittet, David Froelicher, Alban Marguet, Sven Reber and Pascal Cudré.

 

Abstract

Datacenters require huge investments, first to build it, and then powering and cooling the computers require a lot of energy, which ends up having significant costs and being bad for the environment.

But with the rise of the Big Data hype, demand is growing quickly and big companies like IBM are investing massively to build the most performant and scalable data-centers.

Nowadays, almost everyone owns a personal computer, and every company provides a computer on their employee's desktop. The computing power and storage of those machines is though never fully used, which is a regrettable waste.

Project Goal

We intend to design a P2P protocol that we will implement in the form of a desktop application (possible mobile in the future). This protocol will allow to perform Map/Reduce operations over that network of computers connected together in the P2P network.
 

One possible outcome of this project would be to convince entreprises to deploy our application on their pool of computers, so that the idle computing power and the storage of those computers can be used to perform our computations.

 

Methods

We intend to implement our prototype using a framework like JXTA which has a Java implementation, suitable for developping our application in Scala for example.
JXTA does not seem to be an active open-source project anymore though, so we might have to look for something more performing developped since.

 

On top of that P2P framework we will implement: 


Perhaps we will use an existing framework to perform the Map and Reduce phases on the machine, as this is not the main challenge we fixed ourselves for this project.

 

Resources needed

As many machines as possible to deploy our application on and test the performance of our project for a large network of computers.

People from class may be asked to run the software on their machine, otherwise machines from EPFL.

Another approach would be to run a simulation to evaluate the performance of the system with thousands of nodes.

Risks to the success

 

 

Milestones

The first part of the project is going to be about building a robust and scalable distributed file system : RAIDFS for Randomized Aggregation of Interconnected Distributed File System

The second part of the project is going to use that basis to build the Map/Reduce over P2P protocol aimed for.

Color code:  Past deadlines | Next deadline | Future deadlines

 

  1. March 25:
    - Protocol Specification v1.0 0.5 & P2P node state diagram.
    - Work Breakdown for implementation. 
    - Implementation of command line interpreter. 
     
  2. April 3rd: 
    - Protocol Specification v0.8
    - Full messages design & Implementation of Network Structuration ( <=> empty distributed file system but nodes ready to communicated with each others )

    - Hopefully full messages design for updating distributed file system index when a file is added/removed from it.

    -Partial messages design for file chunking, spreading and replicating over the network.
     
  3. April 29 (right before after Easter Break) : 
    -Complete P2P Distributed File System with command line interpreter.
     
  4. May 6th:
    -Protocol Specification v2.0 to allow performing Map/Reduce over previously built P2P network: adding new messages and node states.
     
  5. May 13th:
    -Implementation of Protocol Specification 2.0 ?

 

 

Workpackages

Deadline 1 : March 25th

Protocol Specification 1.0 0.5 (P2P Distributed File System) :

Node State Diagram : 

Straightforward from Protocol Specification.
 

Implementation of command line interpreter : 

David & Alban

 Investigating potential of JXSE library and coding features if necessary : 

Jérémy & Nikita

Work Breakdown for implementation step: 

All team.

 

Deadline 2 : April 3rd

Protocol Specification v0.8

Deadline 3

Implementation of Protocol Specification v1.0.

 

Deadline 4

Protocol Specification 2.0 (P2P Map/Reduce) :

 

Deadline 5

Implementation of Protocol Specification 2.0: