Time-aware Foursquare Venues Recommender

PROJECT: Time-aware Foursquare Venues Recommendation System

 

Motivation

Predict and suggest the trends of places in New York City according to the time and user’s availability. Filtering, aggregating and deeper analysis of data are activities that are planned to be conducted on the retrieved dataset from Foursquare API. This analysis is suitable to be done for a larger city like New York. Of course, the same process could be repeated for any other city in which the number of venues generated by the users is large enough.

 

Goals

1. Analyse the obtained data for the 4SQ venues over time.

2. Determine key available properties that can be used to provide both time and location-based recommendations.

3. Implement recommender that utilizes the aforementioned properties, and provides users with suggestions. This aspect, together with the goal from previous bullet, is the central point of the system.

 

Methods

Data would be obtained in the following way:

We would use Spark as a data processing tool.

 

There are a couple of aspects of recommendation system:

Important part of the project is the data analysis, and finding the right parameters that will deliver the best results. Aforementioned aspects of the recommendation system need to be thoroughly thought. Also, exploring the different types of the recommendation algorithms, and finding the one that is most suitable (and feasible) is crucial.

 

Finally, we would conduct the testing of the results provided by the recommendation system. This would be done by cross-matching the prediction with the already available data.

 

Milestone and organization

First, we will do the task distribution and the team setup. We will decide on which tools to use for every of these tasks.

Then, each team will stick to its task. However, we will also do some general meetings in order to see how the work is doing, if some teams need help, if everybody is aiming to the same direction. As soon as a team has some good results, they can start helping the others.

 

The tasks will have different schedules:

- One team will focus on acquiring of the data, so build a crawler for online streaming and setup the infrastructure. This has to be done quite early as it will be required by the other teams to test their algorithms (and their efficiency) on real datasets.

- One other team will focus on the recommender: they will analyse what already exists, improve it, and develop an efficient tool to recommend venues to the users’ criteria.

- If time allows, one or two persons will focus on implementing the visualization tools for the statistics of the venues and analysis of their trends.

 

                     

 

By dividing the workload in this manner, everyone will be familiar with all parts of the system, and if needed, they can provide their expertise. Also, we will save some time at the end for merging and testing purpose.

 

More detailed timeline is (more detailed task specification is available at the github repo):

 

Needed resources

- GitHub.

- Spark cluster.

- For storage, around ~20GB of space for New York City (that counts about half a million of venues), however we can reduce that number.

- Crawled data specifically for the city of New York., which is big enough to provides meaningful results.

- For visualization, we can use the tool mentioned by Mohammed El Seidy. Bar charts like this or that are interesting for trends.

 

Risk to the success

- Being able to get enough data about the relationship between users and venues is crucial, in order to make insightful statistics and predictions.

- We do not have experience in making efficient and meaningful recommendation system.

 

Team strength

- Experience in web development

- Scala, Java, Python

- Infrastructure experience

- Ready to learn Spark

- For the prediction: three students took a machine learning course.

- Large team comprehend different technologies and frameworks from a technical point of view, and different mindset can be helpful when trying to achieve some goal (the team includes different backgrounds/styles/nationalities and thus makes it easier to develop an app that suits many types of users).

 

Team members