risks

The chances of failure for this project are fairly limited. Every data required can be collected through Github's API and/or from the GitHub archive project.

GitHub limits the requests requiring the be authenticated to 5000 per hour. Requests may also be done without authentication but they are then limited to 60 per hour.

Data may also be gathered from the GitHub archive project. In which case, data can be retrieved as a "snapshot of the day" or in a more "powerful" way by using data exposed through Google BigQuery by the GitHub archive project.

Once the data problem has been dealt with, it is a matter of being able to create meaningful features to add to our feature vector. We are pretty confident that this can be handled by the team members since the team has diverse skills (machine learning, natural language processing, etc.).

Creating the web front-end should not be a big deal at all. Two of the team members are already very familiar with web development and the front-end will be pretty simple.