Recommender System

We want to build a content based recommender system for TV Shows based on the subtitle.

1. Content-based recommendation

Content based recommendation works by extracting features from documents and creating a profile for a user based on those features. If we have both these elements we can simply match a user to new TV Shows based on similarity measures like the cosine similarity.

2. Feature vectors for TV Shows

In our case, the features are the topics, and the feature vectors are exactly the output of our processing step. We have no work to do.

3. Feature vector for users

In this case, we need to have an input from the user : we need him to give us a list of TV Shows he likes. Based on this input, and the feature vectors from these TV Shows we can create a composite user profile.

4. Recommendation

Once we have this information, we have to find shows that are close in terms of topics to the users profile.

If U is the user profile vector (of dimension K, the number of topics), and Q is the the feature vector of some TV Show we can compute the similarity of these vectors by using the cosine metric :

cos(U,Q) = <U,Q>/||<U,Q>||

We can run on our dataset of TV Show a search to find the best match for the user U based on this simple metric.

5. Collaborative recommendations [Optional]

If we can, we would like to improve our recommender system by making collaborative recommendations. The goal here would be to make those recommendations more accurate. 

To achieve that, we could ask the user to rate the relevancy of the results he gets from the system and take this rating into consideration.