Hypber Cube Partitioning

Source https://docs.google.com/document/d/1Gv4Smxgh6r9GpgIl8ZaukXpcQht4i8-mxwLE_II24OI/edit?usp=sharing

 

I. Third implementation:

CubeNAssignment.java: brute implementation for n-way partitioning

Principle: find best partition as hypercubes

Idea:

Testcase:

input ([] sizes, r) --> output (int[] _rd) where sizes are the size each dimension, r is the number of reducers, _rd is the #reducers by each dimension

https://wiki.epfl.ch/bigdata2015-hypercubejoins/documents/CubeNAssignment.java

 

II. Second implementation:

 

CubeAssignmentAdapter.java: adapter interface for brute-force 2,3,4-way partitioning

https://wiki.epfl.ch/bigdata2015-hypercubejoins/documents/CubeAssignmentAdapter.java


 

III. First implementation: Cube3Assignment

 

+ Define 3 dimensions:

private int _r0, _r1, _r2, _r;

private long size0, size1, size2;

private int[][][] regionsCube;

 

+ Approximate algorithm: find best partition as cubes (similar to MatrixAssignment)

- Find prime factors of r (i.e. to find all possibles combination of r0,r1,r2 such that r0 x r1 x r2 = r)

- Consider each possible combination: compare communication cost and computation cost

Computation cost = size0 / r0 * size1 / r1 * size2 / r2;

Commutation cost = size0 / r0 + size1 / r1 + size2 / r2;

- Get the combination with minimum computation cost and minimum communication cost.

 

+ Testcases: (size0, size1, size2, r)

222  [main] INFO  ch.epfl.data.plan_runner.thetajoin.matrix_mapping.Cube3Assignment - Input: [4, 4, 4, 8]

225  [main] INFO  ch.epfl.data.plan_runner.thetajoin.matrix_mapping.Cube3Assignment - #Reducers each dimension: [2, 2, 2]

 

225  [main] INFO  ch.epfl.data.plan_runner.thetajoin.matrix_mapping.Cube3Assignment - Input: [8, 4, 10, 1000]

225  [main] INFO  ch.epfl.data.plan_runner.thetajoin.matrix_mapping.Cube3Assignment - #Reducers each dimension: [8, 4, 10]

 

Full code:

https://wiki.epfl.ch/bigdata2015-hypercubejoins/documents/Cube3Assignment.java