Spark

Spark Libs:

Mlib (machine learning):
https://spark.incubator.apache.org/mllib/

Shark (distributed SQL query engine):

http://shark.cs.berkeley.edu/

Spark-Streaming (support for twitter-api):

https://spark.apache.org/docs/0.9.0/streaming-programming-guide.html

BlinkDB (Queries with Bounded Errors and Bounded Response Times on Very Large Data):

http://blinkdb.org

Apache Mesos (a cluster manager which can run SPARK):

https://mesos.apache.org/

ADATAO(Visual, Real-Time, Predictive Analytics for Big Data on One Unified Platform):

Not open source with open API and they help

http://adatao.com/

GraphX (Graph database):

https://spark.apache.org/docs/0.9.0/graphx-programming-guide.html

Testing:

http://www.scalatest.org/

Spark User Guide:

Spark Style Guide:

https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide

Presentation about Spark:

http://laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-6.pdf

Processing 100 GB of Logs using Spark:

http://www.auriq.com/analyzing-500m-log-records-using-spark-one-developers-experience/

A presentation on how clustering works in Spark:

http://ampcamp.berkeley.edu/wp-content/uploads/2013/02/Machine-Learning-on-Spark-Shivaram-Venkataraman-Strata-2013.pptx

Profiling SPARK Applications:

https://cwiki.apache.org/confluence/display/SPARK/Profiling+Spark+Applications+Using+YourKit

Spark Development:

SBT or Maven
IntelliJ with Plugin

To init run: sbt/sbt update gen-idea

Possible error: Then import the folder into IDEA. When you build the project, you might get a warning about "test and compile output paths" being the same for the "root-build" project. You can fix it by opening File -> Project Structure and changing the output path of the root-build module to be <spark-home>/project/target/idea-test-classes instead of idea-classes.

This wiki
- Home
- Sitemap
- Files
- New page
- Administration
This page
- Edit
- Clean
- Delete
- History
- Print
- Comments (0)
Share

Prospective students portal

Students portal

Researchers portal

Staff portal

Business portal

Mediacorner

Teaching portal

EPFL Alumni Portal

Architecture, Civil and Environmental Engineering ENAC

Basic Sciences SB

Engineering STI

Computer and Communication Sciences IC

Life Sciences SV

Management of Technology CDM

College of Humanities CDH

EPFL

Education

Research

Innovation & Tech Transfer

EPFL Campus

Spark

Spark Libs:

Spark User Guide: