Technical details

Technical details

The language we will use for the core program is Scala. More specifically we will use the Spark API to achieve most of our goals. However we will not limit ourselves only to Scala, for example the first task (parsing and WordCount) will probably be faster with Hadoop, so we will use Java.

Resources

  1. Archives of Le Temps
  2. A cluster to process the data