Hyper Cube Equal-Size Partitioning

 Apply the idea of equal-size matrix partitioning of the paper "Scalable and Adaptive Online Joins" for hypercube.
Basically, it divides the join hypercube into equal-size regions by solving the two equations:
 rd[0] * ... * rd[k-1] = r <br/>
 sizes[0] / rd[0] = sizes[1] / rd[1] = ... = sizes[k] / rd[k] <br/>

 

Input

+ int r: the number of joiners/workers/machines

+ long[] sizes: the size of k relations

 

Output:

+ int[] rd: the number of region divisions by each dimension

+ each region is assigned to 1 machine.

 

The following points have been implemented:

1) If #joiners larger than the size of join matrix itself, each cell is a partition:

r > sizes[0] * ... * sizes[k-1] then rd[i] = sizes[i]

2) Solve the two equivalent partitioning equations with rd[i] are real values:

rd[0] * ... * rd[k-1] = r

sizes[0] / rd[0] = sizes[1] / rd[1] = ... = sizes[k] / rd[k]

3) Round up the real values less than 1 (e.g. 0.78 -> 1) by reducing the other maximal values, such as no value is less than 1.

For example, [7.49, 7.49, 7.49, 7.49, 7.49, 7.49, 0.07, 0.07]

-> [1.0, 1.0, 4.21, 4.21, 7.49, 7.49, 1.0, 1.0]

4) Adjust the real values to integer values.

For example, 7.49 -> try 7 and 8

5) Find the assignment satisfying the tolerance constraint and the minimum cost constraint.

+ tolerance constraint: r/2<= rd[0] * ... * rd[k-1] <= r (at least 50% of the machines are used and rd[i] are intergers)

+ minimum cost constraint (on a single region because all regions have equal size; coined the name equal-size partitioning): 

sizes[0] / rd[0] * sizes[1] / rd[1] * ... * sizes[k-1] / rd[k-1] is minimal (computation cost = #joins in each machine)

sizes[0] / rd[0] + sizes[1] / rd[1] + ... + sizes[k-1] / rd[k-1] is minimal (communication cost = total #tuples sent from all relations to a machine)