Meeting Notes 2015.03.10

https://docs.google.com/document/d/1-ZacPBFFL8UweusyPBi53KeyHFYPY5aXc3TZVPgUzL4/edit?usp=sharing

 

I. Progress done this week:

  1. Install, run, and test Squall in local mode

DIP_DATA_ROOT ../test/data/tpch/

DIP_SQL_ROOT ../test/squall/sql_queries/

DIP_SCHEMA_PATH ../test/squall/schemas/tpch.txt

DIP_RESULT_ROOT ../test/results/

 

2) Understand existing Squall libraries:

 

3) TPCH dataset:

 

4) Fork Squal repo - https://github.com/khgl/squall.git

 

5) HyperCube partitioning is not trivial task:

 

6) StormThetaJoin class uses MatrixAssignment interface. For higher Dimensions there would be many changes - because we can not continue with simple java array structure.

 

7) We should change our Deliverables plan:

    + It is not reasonable to expect to finish both Hypercube partitioning and Squal integration in 2 weeks.

8) Tentative Plan to implement brute force partitioning for hypercube.

Step 1: solve the following equations:

  1. r_d1 x r_d2 x … r_dn = r (the number of reducers)

  2. S_d1 x r_d2 x r_d3 x … r_dn = S_d2 x r_d1 x r_d3 x … r_dn= ....

 

Example 1: matrix partition: R1 x R2, and a pre-defined number of reducers r

We have the equations:

r_1 x r_2 = r

|R1| x r_2 = |R2| x r_1

⇒ r_1^2 = r x |R1| / |R2|  

Example 2: cube partition: R1 x R2 x R3 and a pre-defined number of reducers: r

We have the equation:

r_1 x r_2 x r_3 = r

|R1| x r_2 x r_3 = |R2| x r_1 x r_3 = |R3| x r_1 x r_2

⇒ r_1^3 = r x |R1|^2 / (|R2| x |R_3|)

 

Step 2: Verify with computation cost (p_d1 x p_d2 x p_dn) and communication cost (p_d1 + p_d2 + p_dn), where p is the largest partition (region)

 

Step 3: of course we have to deal with fraction cases but we can group the remaining cells in the same partition. For example, S_d1 = 137 tuples, S_2 = 391 tuples, … the equations will not give integer number, so we have to do some extra steps to round up the missing cells.


 

II. Discussions in the meeting:

Codebase plan:

+ Multi-way version of ThetaJoinStaticComp class: need to change or new class (e.g. MultiJoinComponent)

+ Multi-way version of StormThetaJoin class: need to define new class (e.g. StormMultiJoin)

+ Multi-way version of MatrixAssignment: need to change or define new class (e.g. HyperCubeAssignment)

+ Multi-way version of ThetaJoinStaticMapping: changes or new class

+ Implement CustomStreamGrouping?

Understanding:

 

III. Plan for next week:

- Tentative interface and implementation for multi-way version of MatrixAssignment and StormThetaJoin.

- Set up local storm cluster to get familiar, in preparation for setting up experiment env in Microsoft Azur.