- français
- English
Resources
Required resources
Infrastructural resources
- Server to host the web application (frontend).
- Server to host the database (backend).
- Computational resources to do data analysis
- The analysis is going to be off-line.
- One of the ideas is to "pagerank" all users in our dataset, which is at most seven million (all GitHub accounts).
- Storage resources
- The GitHub archive contains 100GB of data, so ideally we would like to have 150GB of total storage available.
Data resources
Github provides an extensive API to developers of applications that want to make use of their data on the hosted projects on GitHub. This API is rate limited to 5000 requests/hour for authorized accounts. It is difficult to say at this point in time, but it should be enough and otherwise we’ll have to come up with a way to use the requests we have to the best extent.
For the initial data gathering we can actually use our GitHub accounts, thus having seven times the number of available API requests per hour.
Access to data resources is described in more details in the risks page.