- français
- English
Map/Reduce P2P Framework
P2P Distributed File System with an API for communicating with a Map/Reduce Framework.
Project Members
The team members are:
Jérémy Gotteland (Team Leader), Nikita Grishin, Valerian Pittet, David Froelicher, Alban Marguet, Sven Reber and Pascal Cudré.
Abstract
Datacenters require huge investments, first to build it, and then powering and cooling the computers require a lot of energy, which ends up having significant costs and being bad for the environment.
But with the rise of the Big Data hype, demand is growing quickly and big companies like IBM are investing massively to build the most performant and scalable data-centers.
Nowadays, almost everyone owns a personal computer, and every company provides a computer on their employee's desktop. The computing power and storage of those machines is though never fully used, which is a regrettable waste.
Project Goal
We intend to design a P2P protocol that we will implement in the form of a desktop application (possible mobile in the future). This protocol will allow to perform Map/Reduce operations over that network of computers connected together in the P2P network.
One possible outcome of this project would be to convince entreprises to deploy our application on their pool of computers, so that the idle computing power and the storage of those computers can be used to perform our computations.
Methods
We intend to implement our prototype using a framework like JXTA which has a Java implementation, suitable for developping our application in Scala for example.
JXTA does not seem to be an active open-source project anymore though, so we might have to look for something more performing developped since.
On top of that P2P framework we will implement:
- A distributed file system that would suit this kind of network where peers can connect/disconnect at any time.
- The Sort & Shuffle part of Map/Reduce within the network.
- Maybe a scoring system for each node depending on its bandwidth, computing power, etc..
Perhaps we will use an existing framework to perform the Map and Reduce phases on the machine, as this is not the main challenge we fixed ourselves for this project.
Resources needed
As many machines as possible to deploy our application on and test the performance of our project for a large network of computers.
People from class may be asked to run the software on their machine, otherwise machines from EPFL.
Another approach would be to run a simulation to evaluate the performance of the system with thousands of nodes.
Risks to the success
1. The JXTA framework seems to be adequate to create our DFS (speaking of the network communication) but it is really hard to understand and there is not much documentation about it.
2.The objectives that we have fixed may be a little optimistic for the time that we have to finish the project.
Milestones
The first part of the project is going to be about building a robust and scalable distributed file system : RAIDFS for Randomized Aggregation of Interconnected Distributed File System
The second part of the project is going to use that basis to build the Map/Reduce over P2P protocol aimed for.
Color code: Past deadlines | Next deadline | Future deadlines
- March 25:
- Protocol Specification v1.00.5& P2P node state diagram.
- Work Breakdown for implementation. ✔
- Implementation of command line interpreter. ✔
- April 3rd:
- Protocol Specification v0.8 ✔
- Full messages design & Implementation of Network Structuration ( <=> empty distributed file system but nodes ready to communicated with each others ) ✔
- Hopefully full messages design for updating distributed file system index when a file is added/removed from it. ✔
-Partial messages design for file chunking, spreading and replicating over the network.✔
-
April 29 (right
beforeafter Easter Break) :
-Complete P2P Distributed File System with command line interpreter.
-Implementation of Core Node features (State Machine for Node, behaviour on receiving messages) without Networking ✔
-Implementation of CustomAdvertisements to broadcast Index and GlobalChunkfield for files but still having troubles receiving them. ✔ - May 6th-13th:
- Integration of Shell and Core Node Features with JXTA Networking to have a working Distributed File System. ✔
- May 18th:
-Add API for communicating with Map/Reduce Framework✔
-Final debugging.
Workpackages
Deadline 1 : March 25th
Protocol Specification 1.0 0.5 (P2P Distributed File System) : ✔
- Node Discovery : Jérémy ✔
- Indexing namespace metadata (-ls command) : David & Alban ✔
- Uploading data (-put command) | Deleting data (-rm command) | Getting the data (-get command) and Dynamic replication of data: Valerian, Pascal and Sven. ✔
Node State Diagram : ✔
Implementation of command line interpreter : ✔
David & Alban
Investigating potential of JXSE library and coding features if necessary : ✔
Jérémy
Work Breakdown for implementation step: ✔
All team.
Deadline 2 : April 3rd
Protocol Specification v0.8
- Full specification of node discovery and network structuration: Jérémy ✔
- (Hopefully) full specification of messages for updating the index of the distributed file system when a file is added/removed.
Must take into account the case when a user disconnects and then reconnects: it must be updated to be aware of the new directory tree of the file system. : David & Alban ✔
- (Partial) messages design for spreading and replicating chunks of files into the network : Valérian | Sven ✔
- (Partial) messages design for dropping chunks when space is insufficient or optimal replicate numbers have been reached: Pascal | Sven ✔
- Implementation of network structuration: Jérémy ✔
Deadline 3 : April 29th
Separate implementations of features.
- Making nodes able to broadcast the Index, discover the Index, create a group for peers to share a file,discover groups of peers that share a file, get the GlobalChunkField of this file, join the group, ask for chunks. Jérémy & Sven ✔
- Implement shell with basic functions (ls, put, rm) : David & Alban ✔
- Organizing the duplication/dropping of chunks of files by the peers and designing State Machine for a peer: Valérian & Pascal. ✔
Deadline 4 : May 6th - May 13th
Integration of different parts to have a functioning DFS
- Making shell compatible with the other classes and adapt all the messages for sending them with JXTA protocol David & Alban ✔
- Making the different peers communicating with each other and exchanging chunks Jérémy & Sven ✔
- Complete the missing parts of the protocols, etc.. Valérian ✔
Creation of the Map-Reduce API Pascal ✔
Deadline 5 : May 19th
- Final Debugging Everybody
FINAL REPORT : https://wiki.epfl.ch/private/file/list.go?id=3408&key=77lplsxmcxrz0bfcjl6hh8q4gak6jd8f