- français
- English
Hyper Cube Equal-Size Partitioning
Input
+ int r: the number of joiners/workers/machines
+ long[] sizes: the size of k relations
Output:
+ int[] rd: the number of region divisions by each dimension
+ each region is assigned to 1 machine.
The following points have been implemented:
1) If #joiners larger than the size of join matrix itself, each cell is a partition:
r > sizes[0] * ... * sizes[k-1] then rd[i] = sizes[i]
2) Solve the two equivalent partitioning equations with rd[i] are real values:
rd[0] * ... * rd[k-1] = r
sizes[0] / rd[0] = sizes[1] / rd[1] = ... = sizes[k] / rd[k]
3) Round up the real values less than 1 (e.g. 0.78 -> 1) by reducing the other maximal values, such as no value is less than 1.
For example, [7.49, 7.49, 7.49, 7.49, 7.49, 7.49, 0.07, 0.07]
-> [1.0, 1.0, 4.21, 4.21, 7.49, 7.49, 1.0, 1.0]
4) Adjust the real values to integer values.
For example, 7.49 -> try 7 and 8
5) Find the assignment satisfying the tolerance constraint and the minimum cost constraint.
+ tolerance constraint: r/2<= rd[0] * ... * rd[k-1] <= r (at least 50% of the machines are used and rd[i] are intergers)
+ minimum cost constraint (on a single region because all regions have equal size; coined the name equal-size partitioning):
sizes[0] / rd[0] * sizes[1] / rd[1] * ... * sizes[k-1] / rd[k-1] is minimal (computation cost = #joins in each machine)
sizes[0] / rd[0] + sizes[1] / rd[1] + ... + sizes[k-1] / rd[k-1] is minimal (communication cost = total #tuples sent from all relations to a machine)