- français
- English
Meeting Notes 2015.04.14
I. Progress done this week:
1. Hypercube partitioning:
+ Integrated into internal codebase with factory design pattern
+ Provided documentation for all java files
+ Runtime experiment: < 1s (from 2 to 20 relations and from 500 to 2000 machines)
2. Window Azur Set up task:
-
Manually set up a Storm cluster on Azure Linux VMs. Azure services used include: Ubuntu VMs, Virtual Network, CloudService, StorageAccount.
-
Documented the step by step guide here: https://wiki.epfl.ch/bigdata2015-hypercubejoins/stormazureinstallation
-
Azur HDInsight Storm is not used because of the following reasons:
- The cluster machines run on Windows. It appears to have a preview option with Ubuntu but this option is disabled at the moment.
- The cost is expensive. The charge is based on “computing hours” which means as long as the cluster is up, cost incurred. For around 2 days with 1-node cluster doing nothing, it costed ≈ 50chf.
- There is no stop and suspend operation on the HDInsight cluster. The only way to stop the cluster is to delete it which is not reasonable given the amount of time it takes for re-provisioning of the cluster (20 - 30 mins). Also local data can’t be persisted after “suspending” the cluster in this way.
- Not flexible. (i.e, open ports, create squall tmp folder, submit jars…)
so we need to find out how Microsoft Azure provide HDFS like service:
3. SquallToast Integration with Hypercube partitioning:
-
Experiments are performed with Hyracks, TPCH3 and TPCH5 queries.
-
StormDBToasterJoin performs multi-way join with level of parallelism: 3
-
Output tuple results per task in each experiment are as followed:
-
Hyracks query:
Task 1
HOUSEHOLD = 916
BUILDING = 1237
MACHINERY = 852
AUTOMOBILE = 1016
FURNITURE = 1030
Task 2
HOUSEHOLD = 931
BUILDING = 1201
MACHINERY = 800
AUTOMOBILE = 965
FURNITURE = 1005
Task 3
HOUSEHOLD = 925
BUILDING = 1268
MACHINERY = 884
AUTOMOBILE = 998
FURNITURE = 972
b. TPCH3 queries:
Only part of the result is put here. Task1 produces no output tuple.
Task1: (None)
Task2: ...
16322|19941220|0 = 92848.15920000001
44102|19950114|0 = 70760.14369999999
Task3: ….
44102|19950114|0 = 93895.6756
c. TPCH5 queries:
Task1: (None)
Task2:
VIETNAM = 706598.5371000001
INDONESIA = 383966.5044999999
CHINA = 439244.43220000004
JAPAN = 257033.24610000002
INDIA = 205155.0176
Task3:
VIETNAM = 294328.1628
INDONESIA = 182413.0231
CHINA = 300966.3248
JAPAN = 403617.9964
INDIA = 217719.6668
4. HyperCube Join implementation:
-
Implemented Chain Join [A - B - C - D - E]
-
For chain join implementation we did not touch Index and Visitor classes
-
https://github.com/khgl/squall/commit/ae7458772b34d6521afe93924d962a2308d7bc7f
5. Indexing
+ Decided about a pattern
+ Implementation of Create Index
II. Discussions in the meeting:
(to be continued)
III. Plan for next week:
(to be continued)