06.2016 - Switch to PostgreSQL

June 2016 update: Optimizing the lookup phase

Switching from MongoDB to PostgreSQL

After a complete performance review of the lookup phase. We noted several major drawbacks of using MongoDB for Devsearch:

Hence we now switched over to PostgreSQL that allows to have a finer control on DB tunning, a better understanding in query plans, allows to normalize data in the DB and yields better results, moreover without approximations.

For a complete description of this new system. Please refer to the paper that explains the process of this migration. This same document contains a first part that explains extensively the inner workings of Devsearch's online part.

Experimention with a custom made C++ lookup DB

Along with migrating to PostgreSQL was implemented a first draft of a custom-made c++ lookup database. It allows to manually manage and tune memory allocation and take advantage of memory locality during queries.

The implementation leverages string dictionaries to lighten the amount of data read during lookup. Then data is stored and read sequentially using a streamlined scoring aggregation phase that minimizes memory allocation. The draft is available on Github and is a the stage of proof of concept.

Advantages:

Disadvantages:

TODO list: