Meeting Notes 2015.03.31

Source: https://docs.google.com/document/d/1IpDkgh6jWatBSOo6Dh3cAhQmjASCQjw0-bxHgWpQ_38/edit?usp=sharing

I. Progress done this week:

1. Hypercube equal-size partitioning (inspired from Zhang paper):

+ Details: https://wiki.epfl.ch/bigdata2015-hypercubejoins/hcp-equal-size

+ Runtime: < 5ms with 8 relations and 1000 machines

2. Squal integration:

+ Select for tuples is partially ready only index part is left

+ Join - waiting for applying predicate

...

II. Discussions in the meeting:

0. Squall codebase: is being changed (rename, refactor, …)

1. Hypercube partition document:

Full names of classes
flag for usage and parameters
mechanism
refer to full document, url, terminology
what is brute-force implementation, what is equal-size implementation
report: methods, performance (maximum 30sec)

2. Storm integration:

aligned interface with hypercube partitioning
depends on predicate, local index/join
should try the naive implementation of multiway join (e.g. Nested Loop Join, because data is streamed)

3. Deployment of squall on Microsoft Azure:

jar files
beware of squall local path, local rules
storm submitter
how to use resources, multi-machines, multi-users
check whether Azure limits 1 acc / 1 person (e.g. cannot use resources from multi-machines)

4. Nested query (TPCH):

Example:

SELECT A FROM R

WHERE R.A = SUM (S.B)

query executor should not care about query optimizer, i.e. do not try to optimize the received query
Problem 1: semantics. When a new tuple from S arrives, sum (S.B) is updated → must revoke the output tuples as sent before. Might also affect index
Problem 2: parallelization. Must communicate with working machines (no independence and communication cost increases). Affects partitioning results
Possible solutions:

local storage and send by batch (e.g batch size = 100 tuples)
communicate the PARTIAL aggregation result to machines
re-partition
query plan will specify the communication before-hand

III. Plan for next week:

0) Hypercube partitioning code: documentation and beautify

1) Azure: deployment try

2) Local join (without index): going on with Storm integration with naive implementation

3) Khue: SquallToaster

4) Nested query discussion & preparation for experiments

This wiki
- Home
- Sitemap
- Files
- New page
- Administration
This page
- Edit
- Clean
- Delete
- History
- Print
- Comments (0)
Share

Prospective students portal

Students portal

Researchers portal

Staff portal

Business portal

Mediacorner

Teaching portal

EPFL Alumni Portal

Architecture, Civil and Environmental Engineering ENAC

Basic Sciences SB

Engineering STI

Computer and Communication Sciences IC

Life Sciences SV

Management of Technology CDM

College of Humanities CDH

EPFL

Education

Research

Innovation & Tech Transfer

EPFL Campus

Meeting Notes 2015.03.31