Neo4j

Neo4j Backend for AiiDA

Introduction

Neo4j is an open-source graph database, implemented in Java. Neo4j was first released in 2010 and its adoption has grown ever since.

What we implemented

First of all we build a dockerfile that can automatically build a Neo4j instance on a machine by anyone. This was the easiest way to be sure that the experience and benchmarking could be reproduced. Now what is not in the dockerfile is the installation of the plugin, as it takes manual intervention (still in early stage) and the configuration. We provide an example of configuration file on the git repo that was used for the benchmarking. Note that configuration files have most of security features disabled and should not be used as such.

To be able to provide our benchmarking results, we created scripts that exports data from AiiDA in a csv format. From there, we re-import the data in Neo4j using gremlin.

For Neo4j to be able to support gremlin we had to install a plugin (https://github.com/neo4j-contrib/gremlin-plugin) since it’s native laguage is Cypher. Bear in mind that the plugin is in very early stage thus it does not work out of the box. Some manual adjustments need to be done in order to make it work.

Further work that can be done

It is clear that this database need a lot of optimization and some effort should be invested in rethinking the data representation. For the moment the data scheme are optimized for relational DBS (such as MySQL) and are not suited at all for graph database systems.

Another thing is that we used a plugin to run gremlin queries and it is not its native language. Therefore, by removing the plugin and rewriting the queries one could expect an increase in performance. Originally we kept the gremlin scripts to be certain to have the same kind of data and queries between Titan and Neo4j.

Lastly but not least, we did not used indexation at all which prevents Neo4j from unleashing its true power. We wanted to benchmark the worst case scenario and this is the reason that we hadn’t made the use of it. But a lot of Neo4j’s performance depends on it and thus by rebuilding the data in a good graph oriented scheme and by clever indexation one should be able to acquire much better results then what we could achieve.

This wiki
- Home
- Sitemap
- Files
- New page
- Administration
This page
- Edit
- Clean
- Delete
- History
- Print
- Comments (0)
Share

Prospective students portal

Students portal

Researchers portal

Staff portal

Business portal

Mediacorner

Teaching portal

EPFL Alumni Portal

Architecture, Civil and Environmental Engineering ENAC

Basic Sciences SB

Engineering STI

Computer and Communication Sciences IC

Life Sciences SV

Management of Technology CDM

College of Humanities CDH

EPFL

Education

Research

Innovation & Tech Transfer

EPFL Campus

Neo4j