- français
- English
Deliverables
1) Implement the hypercube partitioning given the number of reducers (1
week)
- Evaluate the tradeoff between communication and computation: by trying different number of reducers (kr) for a hypercube join, validate the fact that there is an optimal kr.
- How it depends on the local join implementation?
2) Integrate it with Squall (1 week)
3) Local indexes for more ecient execution of equi-joins, band-joins and
inequality joins in Squall (1 week)
4) Experiments & Identify TPC-H [6] queries which could be executed with-
out any communication among the machines of the same operator. Some
of the nested queries certainly does not qualify. (3 weeks)
- Set up Microsoft Azure images, with a tutorial for us how to do it (we might change version of Storm or Squall). Identify TPC-H [6] queries which could be executed without any communication among the machines of the same operator. Some of the nested queries certainly does not qualify. For each TPC-H query that requires communication among the machines of the same operator, precisely explain what needs to be communicated and between which machines.
- What is the effect of the total number of tuples sent over the network (including the intermediate stages) on the total execution time?
- How it depends on the local join implementation
Writeup (1 week)