Spark

Spark Libs:

Mlib (machine learning):
https://spark.incubator.apache.org/mllib/

Shark (distributed SQL query engine):
http://shark.cs.berkeley.edu/
 
Spark-Streaming (support for twitter-api):
https://spark.apache.org/docs/0.9.0/streaming-programming-guide.html
 
BlinkDB (Queries with Bounded Errors and Bounded Response Times on Very Large Data):
http://blinkdb.org
 
Apache Mesos (a cluster manager which can run SPARK):
https://mesos.apache.org/
 
ADATAO(Visual, Real-Time, Predictive Analytics for Big Data on One Unified Platform):
Not open source with open API and they help
http://adatao.com/
 
GraphX (Graph database):
https://spark.apache.org/docs/0.9.0/graphx-programming-guide.html
 
Testing:
http://www.scalatest.org/
 
 

Spark User Guide:

Spark Style Guide:
https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
 
Presentation about Spark:
http://laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-6.pdf

Processing 100 GB of Logs using Spark:

http://www.auriq.com/analyzing-500m-log-records-using-spark-one-developers-experience/

A presentation on how clustering works in Spark:

http://ampcamp.berkeley.edu/wp-content/uploads/2013/02/Machine-Learning-on-Spark-Shivaram-Venkataraman-Strata-2013.pptx

Profiling SPARK Applications:
https://cwiki.apache.org/confluence/display/SPARK/Profiling+Spark+Applications+Using+YourKit
 
Spark Development:
  • SBT or Maven
     
  • IntelliJ with Plugin
To init run: sbt/sbt update gen-idea
Possible error: Then import the folder into IDEA. When you build the project, you might get a warning about "test and compile output paths" being the same for the "root-build" project. You can fix it by opening File -> Project Structure and changing the output path of the root-build module to be <spark-home>/project/target/idea-test-classes instead of idea-classes.