- français
- English
Tasks
Tasks
1. First milestone
The tasks we want to accomplish for the first milestone are:
1. Parse the XML files, extract the words to obtain raw text
2. Do a word count, and store the output as CSV (format: word, #occ, year)
3. Compute the word temporal profiles (first as a List then as a graph)
Repartition of the work:
- Parsing Data (Zhivka, Florian) [parsing the data, then continue with research for the 2nd part]
- MapReduce: make the 1-Gram and put them in CSV: Word, #occ, #year (Fabien, John)
- Begin research of 2nd part (Mathieu [Spark testing, Code style] Ana [Search for machine learning techniques we could use], Valentin [Look at User Interface implementation possibilities])
- Naive solutions (Sidney [compare means], Joanna [Look for word with exactly same temporal word profiles])
2. Second milestone
To cluster the word temporal profiles, we will look into the following:
1. Fourier transform
2. Machine Learning/Artificial Intelligence
3. Time series
[To complete]
3. Third milestone [To complete]