Deliverables

1) Implement the hypercube partitioning given the number of reducers (1

week)

Evaluate the tradeoff between communication and computation: by trying different number of reducers (kr) for a hypercube join, validate the fact that there is an optimal kr.
How it depends on the local join implementation?

2) Integrate it with Squall (1 week)

3) Local indexes for more ecient execution of equi-joins, band-joins and

inequality joins in Squall (1 week)

4) Experiments & Identify TPC-H [6] queries which could be executed with-

out any communication among the machines of the same operator. Some

of the nested queries certainly does not qualify. (3 weeks)

Set up Microsoft Azure images, with a tutorial for us how to do it (we might change version of Storm or Squall). Identify TPC-H [6] queries which could be executed without any communication among the machines of the same operator. Some of the nested queries certainly does not qualify. For each TPC-H query that requires communication among the machines of the same operator, precisely explain what needs to be communicated and between which machines.
What is the effect of the total number of tuples sent over the network (including the intermediate stages) on the total execution time?
How it depends on the local join implementation

Writeup (1 week)