Meeting Notes 2015.04.21

1. SquallToast Integration with Hypercube partitioning

Troubleshooted TPCH3 and TPH5 query, the reason that task 1 has no output is that the partitioning only uses 2 machines instead of 3 machines. This is because the cost criteria include both computation cost and communication cost. If only computation cost is considered, the partitioning will use 3 machines.

2. General Graph model for any combinations of join

In the last meeting we discussed how to extend chain join model to any join model. We have implemented general abstraction for different kind of join models. We did not push implementation  because intersection part is not finished yet. This week we are planing to finish both implementation and integration with new Squal code base. 

3. TPCH Query Analysis: 

TPCH2: 

SELECT TOP 100 S_ACCTBAL, S_NAME, N_NAME, P_PARTKEY, P_MFGR, S_ADDRESS, S_PHONE, S_COMMENT

FROM PART, SUPPLIER, PARTSUPP, NATION, REGION
WHERE P_PARTKEY = PS_PARTKEY AND S_SUPPKEY = PS_SUPPKEY AND P_SIZE = 15 AND
P_TYPE LIKE '%%BRASS' AND S_NATIONKEY = N_NATIONKEY AND N_REGIONKEY = R_REGIONKEY AND
R_NAME = 'EUROPE' AND
PS_SUPPLYCOST = (SELECT MIN(PS_SUPPLYCOST) FROM PARTSUPP, SUPPLIER, NATION, REGION
 WHERE P_PARTKEY = PS_PARTKEY AND S_SUPPKEY = PS_SUPPKEY
 AND S_NATIONKEY = N_NATIONKEY AND N_REGIONKEY = R_REGIONKEY AND R_NAME = 'EUROPE')
ORDER BY S_ACCTBAL DESC, N_NAME, S_NAME, P_PARTKEY
 
Nested query: 
SELECT MIN(PS_SUPPLYCOST) FROM PARTSUPP, SUPPLIER, NATION, REGION
 WHERE P_PARTKEY = PS_PARTKEY AND S_SUPPKEY = PS_SUPPKEY
 AND S_NATIONKEY = N_NATIONKEY AND N_REGIONKEY = R_REGIONKEY AND R_NAME = 'EUROPE'
 
Tentative query Plan: 
1) Perform first the nested query using DBToasterJoinComponent:
- On result change - send 2 tuple: 1 insert tuple and 1 delete tuple. 
 
2) The output tuple of the component above will be join with another DBToasterJoinComponent which should handle the insert and delete tuple event
 
4) Experiment with Azure cluster:
- Data is replicated across machine for now (at /data/tpch)
- Nimbus: efnimbus1.cloudapp.net
- Superviors: supervisor1.cloudapp.net
                     supervisor2.cloudapp.net
SSH: azureuser. Password ******
StormUI:  http://efnimbus1.cloudapp.net:28080/index.html
 
5) Merged SquallToast with new changes in master.
Dbtoaster as standalone binary now. 
 
 
Discussions:
1) Problem: scala 2.10 vs 2.11. Solution: dbtoaster as standalone binary (native call). 
+ No need to install scala on the cluster. Generated code requires scala lib --> include scala directly into squall system.
e.g. scala.lib.jar 2.11, (dbtt.compiler, /bin/dbtoaster) -> scala code
+ Recompile problem (sbt.clean): download dependency again
--> Plan: modify install.sh (install from repos, not copy)
2) Nested query: two cases:
Q2: Some queries need the whole relation (e.g. min) --> implement as subsequent hypercube joins
Q1: Some queries only need part of tuples? --> only need 1 hypercube join. Solution: use timestamp. Problems: inefficiency, timestamp sync. Because need to keep track of timestamp for everyfile being read (from machines).
--> Plan: 
+ Identify partial queries
+ Min (S) is sometimes local?
+ Talk about right semantics (timestamp)
+ In Q2, how to evaluate resources between 2 hypercubes?
+ Correction between query (nested pattern) and resource allocation
 
3) Random assignment function (configuration: diff funcs)
+ Problem: with 2 supervisors, rsf seems not uniform.
+ #Workers/cores/places/threads/supervisors
+ Check if random assignment function works properly with more than 2 supervisors.
 
Next steps:
+ Experiments
+ Merge with main branch
+ Analysis