Big Data/BI Zone is brought to you in partnership with:

Andreas Kollegger is a leading speaker and writer on graph databases and Neo4j and the bridge between community and developer efforts. He works actively in the community, speaking around the world and promoting the larger Neo4j ecosystem of projects. Author of Fair Trade Software, and the lead for Neo4j in the cloud, Andreas plays a valuable role for progressive happenings within Neo4j. Andreas is a DZone MVB and is not an employee of DZone and has posted 64 posts at DZone. You can read more from them at their website. View Full User Profile

Graph Databases and Software Metrics & Analysis

05.27.2013
| 6575 views |
  • submit to reddit

This article is by Michael Hunger.

This is the first in a series of blog posts that discuss the usage of a graph database like Neo4j to store, compute and visualize a variety of software metrics and other types of software analytics (method call hierarchies, transitive clojure, critical path analysis, volatility & code quality). Follow up posts by different contributors will be linked from this one.

Everyone who works in software development comes across software metrics at some point. 
Just because of curiosity about the quality or complexity of the code we've written, or a real interest to improve quality and reduce technical debt, there are many reasons. 
In general there are many ways of approaching this topic, from just gathering and rendering statistics in diagrams to visualizing the structure of programs and systems.

There are a number of commercial and free tools available that compute software metrics and help expose the current trend in your projects development. 
Software metrics can cover different areas. Computing cyclomatic complexity, analysing dependencies or call traces is probably easy, using statical analysis to find smaller or larger issues is more involved and detecting code smells can be an interesting challenge in AST parsing.

Interestingly, many visualizations in and around software development are graph visualizations, from class- and other (UML) diagrams via dependency tracing between and within projects to architectural analysis. One of the reasons of this might be that source code in general can be easily represented as graphs. On the one hand we have trees, especially (abstract) syntax or parse trees (per file, class or structural element) on the other the actual dependencies from project, package, class to method level form a huge directed (cyclic) graph. Also related topics like application (DI) or system orchestration or hard- and software networks are effectively graph structures.

So, having a graph database like Neo4j at hand, what would be more obvious than parsing software systems and project at a certain level and importing the information into the graph database. The graph structure that would accomodate the information quite well would be a direct representation of the concepts in the software projects, consisting of projects, packages, classes, interfaces, types, methods, fields and containing relationships like dependencies, usage, creation, containment, calls, coverage, etc.

Simple Graph Model for Dependency Analysis

Having achieved this, it is completely up to your interests and needs, what you can do with this data. Computing metrics, visualizing and tracing dependencies, finding violations of architectural rules, finding co-usage of classes, detecting interesting patterns or code smells, there are many possibilities.

Just to give one example, a cypher query that calculates the top 10 classes with the longest inheritance paths:


START root=node:types(class="java.lang.Object")MATCH chain = (root)<-[:EXTENDS*]-(leaf)RETURN extract(class IN nodes(chain) : class.name) AS classes,length(chain) AS depthORDER BY depth DESCLIMIT 10


Other tools besides Cypher to help you with this endeavour are:

  • ASM, Antlr or similar parsers for parsing byte- or source code.
  • Neo4j-Shell for exploration
  • Visualisation with GraphViz, D3, VivaGraphJS, Linkurious or others

Another options is to take a time dimension into account to see how structure, elements and relationships change over time.

So it is not suprising, that quite a number of people found this topic interesting enough to invest time and energy to create an intriguing and insightful example of using graph databases in this field. We asked all the participants listed below to write a blog post detailing their idea and make their code/approach accessible. We start to link to existing resources but will update them as soon as the blog posts are online.


  • Raoul-Gabriel Urma: Expressive and Scalable Source Code Queries with Graph Databases (Paper)
  • Rickard Öberg: NeoMVN is tracing maven dependencies (GitHub)
  • Pavlo Baron: Graphlr, a ANTLR storage in Neo4j (GitHub)
  • Dirk Mahler: Enforcing architectural constraints as part of your build process
  • Michael Hunger: Class-Graph leverages Cypher to collect structural insights about your Java projects (GitHub), (Slideshare), (class-graph.zip)

Published at DZone with permission of Andreas Kollegger, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)