- français
- English
Meeting Notes 2015.03.31
Source: https://docs.google.com/document/d/1IpDkgh6jWatBSOo6Dh3cAhQmjASCQjw0-bxHgWpQ_38/edit?usp=sharing
I. Progress done this week:
1. Hypercube equal-size partitioning (inspired from Zhang paper):
+ Details: https://wiki.epfl.ch/bigdata2015-hypercubejoins/hcp-equal-size
+ Runtime: < 5ms with 8 relations and 1000 machines
2. Squal integration:
+ Select for tuples is partially ready only index part is left
+ Join - waiting for applying predicate
...
II. Discussions in the meeting:
0. Squall codebase: is being changed (rename, refactor, …)
1. Hypercube partition document:
-
Full names of classes
-
flag for usage and parameters
-
mechanism
-
refer to full document, url, terminology
-
what is brute-force implementation, what is equal-size implementation
-
report: methods, performance (maximum 30sec)
2. Storm integration:
-
aligned interface with hypercube partitioning
-
depends on predicate, local index/join
-
should try the naive implementation of multiway join (e.g. Nested Loop Join, because data is streamed)
3. Deployment of squall on Microsoft Azure:
-
jar files
-
beware of squall local path, local rules
-
storm submitter
-
how to use resources, multi-machines, multi-users
-
check whether Azure limits 1 acc / 1 person (e.g. cannot use resources from multi-machines)
4. Nested query (TPCH):
-
Example:
SELECT A FROM R
WHERE R.A = SUM (S.B)
-
query executor should not care about query optimizer, i.e. do not try to optimize the received query
-
Problem 1: semantics. When a new tuple from S arrives, sum (S.B) is updated → must revoke the output tuples as sent before. Might also affect index
-
Problem 2: parallelization. Must communicate with working machines (no independence and communication cost increases). Affects partitioning results
-
Possible solutions:
-
local storage and send by batch (e.g batch size = 100 tuples)
-
communicate the PARTIAL aggregation result to machines
-
re-partition
-
query plan will specify the communication before-hand
III. Plan for next week:
0) Hypercube partitioning code: documentation and beautify
1) Azure: deployment try
2) Local join (without index): going on with Storm integration with naive implementation
3) Khue: SquallToaster
4) Nested query discussion & preparation for experiments