Deliverables

1) Implement the hypercube partitioning given the number of reducers (1

week)

 

  • Evaluate the tradeoff between communication and computation: by trying different number of reducers (kr) for a hypercube join, validate the fact that there is an optimal kr.
  • How it depends on the local join implementation?

 

2) Integrate it with Squall  (1 week)

 

3) Local indexes for more ecient execution of equi-joins, band-joins and

inequality joins in Squall  (1 week)

 

4) Experiments & Identify TPC-H [6] queries which could be executed with-

out any communication among the machines of the same operator. Some

of the nested queries certainly does not qualify. (3 weeks)

  • Set up Microsoft Azure images, with a tutorial for us how to do it (we might change version of Storm or Squall). Identify TPC-H [6] queries which could be executed without any communication among the machines of the same operator. Some of the nested queries certainly does not qualify. For each TPC-H query that requires communication among the machines of the same operator, precisely explain what needs to be communicated and between which machines.
  • What is the effect of the total number of tuples sent over the network (including the intermediate stages) on the total execution time?
  • How it depends on the local join implementation

 

Writeup (1 week)