06.2016 - Switch to PostgreSQL

June 2016 update: Optimizing the lookup phase

Switching from MongoDB to PostgreSQL

After a complete performance review of the lookup phase. We noted several major drawbacks of using MongoDB for Devsearch:

Slow grouping a large amount of rows.
Grouping does not leverage available indexes.
No way to implement a custom scoring function inside the DB (except for map reduce, but it currently yields poor performances)
100Mb limit for grouping operations.
Lack of thorough profiling tools for queries.
Current slow queries are already using approximations based on rarity to reach faster processing.

Hence we now switched over to PostgreSQL that allows to have a finer control on DB tunning, a better understanding in query plans, allows to normalize data in the DB and yields better results, moreover without approximations.

For a complete description of this new system. Please refer to the paper that explains the process of this migration. This same document contains a first part that explains extensively the inner workings of Devsearch's online part.

Experimention with a custom made C++ lookup DB

Along with migrating to PostgreSQL was implemented a first draft of a custom-made c++ lookup database. It allows to manually manage and tune memory allocation and take advantage of memory locality during queries.

The implementation leverages string dictionaries to lighten the amount of data read during lookup. Then data is stored and read sequentially using a streamlined scoring aggregation phase that minimizes memory allocation. The draft is available on Github and is a the stage of proof of concept.

Advantages:

Faster
Easier and more flexible implementation scoring

Disadvantages:

Prone to bugs
No support for SQL and no runtime optimization
Requires to implement a server to accept queries

TODO list:

Implement a server to accept queries
Further optimize memory allocation on aggregation
Implement some sort of peristency (on disk)
Implement more scoring features (most notably: clustering)

This wiki
- Home
- Sitemap
- Files
- New page
- Administration
This page
- Edit
- Clean
- Delete
- History
- Print
- Comments (0)
Share

Prospective students portal

Students portal

Researchers portal

Staff portal

Business portal

Mediacorner

Teaching portal

EPFL Alumni Portal

Architecture, Civil and Environmental Engineering ENAC

Basic Sciences SB

Engineering STI

Computer and Communication Sciences IC

Life Sciences SV

Management of Technology CDM

College of Humanities CDH

EPFL

Education

Research

Innovation & Tech Transfer

EPFL Campus