- français
- English
DFS Protocol
- Definitions :
Chunkfield of node N for file F (local chunkfield):
- Bit array of size equals to its associated file chunk count
- Chunkfield[i] == 1 iff N posses the chunk i of F
Global chunkfield of node N with neighbors Ns for file F :
- Int array of size equals to its associated file chunk count
- GlobalChunkfield = sum of local chunkfields of of N and Ns for file F
- GlobalChnunkfield[i] == k iff exactly k nodes in (N :: Ns) possess chunk i of F
DFS Index :
Meta data over the file system, including
- directory tree
- files data : (fullPathName, id, chunkCount, stabilized) for each file
- chunk data : (fileId, chunkId, chunk_hash) for each file, for each chunk (for consistence purpose)
(- file size)
(- file chunksize)
Network constants :
- minDegree, optDegree, maxDegree => describe number of connexion kept by each node
- minReplication, optReplication, maxReplication => describe number of replicaion of files over the network
- Network stabilization (assuming all files of index are marked as stabilized):
- Main Idea : Bit-torrent architecture where a peer relies on its close neighbors to store some pieces of data.
- Nodes Data :
(DFS Index)
chunkfield of stored data for each file
neighbors address (count in [minN, maxN])
chunkfields of neighbors
- Node behavior :
connexions :
- each node keeps a list of n neighbors with good connexion (~close neighbors) n in [minDegree, maxDegree], aiming optDegree
- to keep track of neighbors, connected nodes should send keep alive message to each other, periodically
- Regularly, nodes should update the chunkfields of its neighbors
data storage :
-when all values of global chunkfield are over the minReplication, the node will consider itself as responsible for the file
-if a value of the global chunkfield of a node responsible for the file drops below minReplication count, the node will start to gather random missing pieces in order to increase its global chunkfield up to optReplication.
-Based on the assumption that close nodes will be heavily interconnected, its neighbors will act the same way, and the global chunkfiled will stabilize again.
This allows new responsible nodes to appear in network when old ones disconnect.
- Put :
- (locally) separate file in chunks
- update DFS Index (with unstabilized state of file)
- Node behavoir (unstabilized file) :
if a file of index is unstabilized, each node will consider itself as responsible for the file.
when a node reaches a global chunkfield with optimal values, it will update the status of file in the index as stabilized
This stabilized-unstabilized state of file allows early wide replication over network but stops it when reaching enough replication over the network.
- Get :
- ask neighbors for missing chunks. Works like bittorrent
- Rm :
- update DFS Index (rm ref from index)
- when any node updates its index, check new/old files, drop chunk of nonexisting files
// needs to be better defined
- Node integration :
contact a set of nodes
keep optDegree best connections as neighbors
get a fresh DFS Index
sets space available (?)
Build a global chunkfield => may get responsible for some files
pulls some pieces from files for which the node feels responsible (selecting policy to be defined for the pieces pulling)
- Chunk deletion policy
1. When receiving a chunkfield for A
if some chunk replication # > maxChunkRep
=> launch « repComplete A » procedure if not already done.
=> put a random timed "death sentence" on chunk.
If maxStorageVolume reached AND globalchunkfield for a chunk of A > optChunkRep
=> put a random timed death sentence onchunk.
If death sentence timer runs out and chunk replication # > maxChunkRep
=> delete chunk
2. When completing transfert of chunk
if global chunkfield + 1( for this chunk) > minChunkRep AND # of chunk of same file is > maxChunkCluster
=> delete chunk
- Chunk optimal replica number
1. At regular interval
Node1 => request chunk field for each index file Neighbors Nodes
2.When receive chunkfield for A.
if computed global ChunkField for chunk A > OptChunkRep
=> Send to all neighbors « repComplete A,
with list of neighbors hosting chunk»
3. When receive "repComplete A, list[Neighbors]"
=> store {A, sender, list}
=> ignore chunk A in globalchunkfield
=> ignore replication request for this chunk and send « repComplete A »
=> send to all neighbors except sender : « repComplete A, list »
4. When connection with a peer from some {repComplete, A, list} lost(keepalive timeout)
=> Send to all neighbors « repRestart A»
=> reset repComplete state of chunk in chunkfield
=> delete {repComplete, A, list}
5. When receive « repRestart A »
=> reset repComplete state of chunk in chunkfield
=> delete {repComplete, A, list}
=> send to all neighbors « repRestart A »