- français
- English
Map/Reduce P2P - Home - OLD
Project Members
The team members are:
Jérémy Gotteland (Team Leader), Nikita Grishin, Valerian Pittet, David Froelicher, Alban Marguet, Sven Reber and Pascal Cudré.
Abstract
Datacenters require huge investments, first to build it, and then powering and cooling the computers require a lot of energy, which ends up having significant costs and being bad for the environment.
But with the rise of the Big Data hype, demand is growing quickly and big companies like IBM are investing massively to build the most performant and scalable data-centers.
Nowadays, almost everyone owns a personal computer, and every company provides a computer on their employee's desktop. The computing power and storage of those machines is though never fully used, which is a regrettable waste.
Project Goal
We intend to design a P2P protocol that we will implement in the form of a desktop application (possible mobile in the future). This protocol will allow to perform Map/Reduce operations over that network of computers connected together in the P2P network.
One possible outcome of this project would be to convince entreprises to deploy our application on their pool of computers, so that the idle computing power and the storage of those computers can be used to perform our computations.
Methods
We intend to implement our prototype using a framework like JXTA which has a Java implementation, suitable for developping our application in Scala for example.
JXTA does not seem to be an active open-source project anymore though, so we might have to look for something more performing developped since.
On top of that P2P framework we will implement:
- A distributed file system that would suit this kind of network where peers can connect/disconnect at any time.
- The Sort & Shuffle part of Map/Reduce within the network.
- Maybe a scoring system for each node depending on its bandwidth, computing power, etc..
Perhaps we will use an existing framework to perform the Map and Reduce phases on the machine, as this is not the main challenge we fixed ourselves for this project.
Resources needed
As many machines as possible to deploy our application on and test the performance of our project for a large network of computers.
People from class may be asked to run the software on their machine, otherwise machines from EPFL.
Another approach would be to run a simulation to evaluate the performance of the system with thousands of nodes.
Risks to the success
Milestones
The first part of the project is going to be about building a robust and scalable distributed file system : RAIDFS for Randomized Aggregation of Interconnected Distributed File System
The second part of the project is going to use that basis to build the Map/Reduce over P2P protocol aimed for.
Color code: Past deadlines | Next deadline | Future deadlines
- March 25:
- Protocol Specification v1.00.5& P2P node state diagram.
- Work Breakdown for implementation. ✔
- Implementation of command line interpreter. ✔
- April 3rd:
- Protocol Specification v0.8
- Full messages design & Implementation of Network Structuration ( <=> empty distributed file system but nodes ready to communicated with each others )
- Hopefully full messages design for updating distributed file system index when a file is added/removed from it.
-Partial messages design for file chunking, spreading and replicating over the network.
- April 29 (right
beforeafter Easter Break) :
-Complete P2P Distributed File System with command line interpreter.
- May 6th:
-Protocol Specification v2.0 to allow performing Map/Reduce over previously built P2P network: adding new messages and node states.
- May 13th:
-Implementation of Protocol Specification 2.0 ?
Workpackages
Deadline 1 : March 25th
Protocol Specification 1.0 0.5 (P2P Distributed File System) :
- Node Discovery : Jérémy & Nikita
- Indexing namespace metadata (-ls command) : David & Alban
- Uploading data (-put command) | Deleting data (-rm command) | Getting the data (-get command) and Dynamic replication of data: Valerian, Pascal and Sven.
Node State Diagram : ✘
Implementation of command line interpreter : ✔
David & Alban
Investigating potential of JXSE library and coding features if necessary : ✔
Jérémy & Nikita
Work Breakdown for implementation step: ✔
All team.
Deadline 2 : April 3rd
Protocol Specification v0.8
- Full specification of node discovery and network structuration: Jérémy & Nikita
- (Hopefully) full specification of messages for updating the index of the distributed file system when a file is added/removed.
Must take into account the case when a user disconnects and then reconnects: it must be updated to be aware of the new directory tree of the file system. : David & Alban
- (Partial) messages design for spreading and replicating chunks of files into the network : Valérian | Sven
- (Partial) messages design for dropping chunks when space is insufficient or optimal replicate numbers have been reached: Pascal | Sven
- Implementation of network structuration: Jérémy
Deadline 3
Implementation of Protocol Specification v1.0.
- Making nodes contact each other and organize into groups to share files : Jérémy
- Maintaining a global index of the distributed file system : David & Alban
- Organizing the duplication/dropping of chunks of files by the peers: Sven, Valérian and Pascal.
Deadline 4
Protocol Specification 2.0 (P2P Map/Reduce) :