Meeting Notes 2015.04.24

Remaining requirements:

0) Writing query plans, nested part, Azure (cost, login, run scripts)

1) Experiments (e.g. 3 cases/queries per person)

2) Local join implementation

3) Integration with main branch

 

Next steps:

1) Unit tests:

+ Tuples

+ Distributed query plans

2) Local query plan (order of relation, cost optimization)

 

Local query plan - Cost optimization - Ordering heuristics:

1) Look at comparison predicate:

+ e.g. equality join output might be smaller than inequality join output.

2) Size lookup (relation size, selectivity).

+ e.g. with equality join we can know the output size before-hand

3) Try different orders (randomly), choose one which is cheapest:

+ Different orders = different paths (depth-first search) of the join dependency graph (each edge indicates there is a join condition between two relations). Worst case: complete graph (n! paths).

e.g. with R-S-T-V we have other possible orders: T-V-S-R, T-S-R-V. But T-R-S-V is invalid because we have no join condition between T and R.

+ Online setting: current tuple is T's and we have current sizes of other relations (received tuples).

+ Keep track of average running time (e.g. <1ms, invoke1>). Maybe normalized with the current sizes of relations. 

e.g.

T: (T-V-S-R, 1ms), (T-S-R-V, 2ms)

S: (S-T-V-R, AVG), (S-R-T-V, AVG)