All Collections
Data Analysis
Gremlin
Getting the Most Out of Your Graph Data With Gremlin
Getting the Most Out of Your Graph Data With Gremlin

Get an in-depth introduction to graph analysis and modeling by looking at the data as a network of vertices and edges.

Kristine Marhilevica avatar
Written by Kristine Marhilevica
Updated over a week ago

One of enterprise architecture’s many responsibilities is to aid in the decision-making process. When important decisions are based on the answers provided by your architecture, it is essential that the data in your architecture is trusted to be valid and up to date.

For data to be trustworthy, it should reflect what we are modeling as closely as possible. The closer your architecture is to reality, the easier it is to maintain, the less opinionated it becomes, and the more it can be trusted.

Modeling Data as a Graph

In order to meet these demands, Ardoq’s approach to enterprise architecture is to model data in a directed property graph by looking at the data as a network of vertices and edges. Vertices can represent “things”, like departments, applications, or your business’ capabilities. While the edges on the other hand represent relationships between the vertices. The edges must have a direction, and go from one vertex to another.

Additional data can be added to vertices and edges in the form of properties. In Ardoq, vertices are referred to as components, the edges are referred to as references and finally, properties are referred to as fields.

Ardoq relationships

In Ardoq, relationships between components are stored in references, and can in many cases be read as sentences, as shown here in Ardoq’s Block Diagram view.


This graph style of modeling could be quite familiar, as it closely models the way we think of the world. This is reflected in our data-driven visualizations, which provide an easy way to get an overview of how your data is connected.

Ardoq components and references

Components and references in Ardoq, as visualized in the Relationships view. More specifically, the visualization shows which applications are used by a specific department, and how those applications realize both business capabilities and technical capabilities.


What makes graph systems unique, is that their underlying implementation is designed to efficiently process connected data. A graph system can efficiently handle a model flexible enough to conform to your business rather than the other way around. In practice, this means that we are able to model the architecture as closely as possible to the business, which is important to be able to trust the validity of the data.

The single most important step you can take to make the most out of your data is to get a good understanding of what your graph looks like. You will use that understanding to be able to formulate and ask questions on the data in your graph. Getting that understanding can be very difficult when the data grows more complex and when there are multiple people collaborating on the same documentation.

To really understand how your graph is structured, it is often useful to look at a metamodel of the graph. The metamodel is a high-level representation of how the data in the graph is structured and it allows you to see which component types and reference types are interconnected. Ardoq enables you to get a fuller understanding of your graph data by creating a visual representation of the current metamodel of your graph. This is useful to align stakeholders, creating a shared understanding of the data and the purpose of the data. More importantly, it is a crucial aid in understanding how to ask and formulating the right questions so that you are leveraging your graph to the fullest potential, finding the right answers in your data.

Ardoq metamodel

To understand how the documentation is connected, Ardoq continuously creates an up-to-date Metamodel based on a subset or a complete set of all the components and references in your Ardoq instance.

The Gremlin Query Language

The graph gives us the possibility to easily retrieve data through references. In technical terms, this means that we don’t have to store data where we think we might want to look for it. Traditionally, you would need to structure your data so that it conformed to the limitations of the given technology.

Instead, by leveraging graph technology, we can store the data where it truly belongs, knowing that any derived values can be extracted through the use of graph queries.

This is where Gremlin comes in. Gremlin is the graph traversal language that Ardoq’s TinkerPop-enabled Enterprise Intelligence graph system is built around. Gremlin is a functional, data-flow language that enables users to express complex graph traversals with relatively little code. Let’s look at a few simple examples.

Get All Application

g.V().hasLabel('Application')
Ardoq get all application

Get All Components Which Are Realized by an Application

g.V().  hasLabel('Application').  out('Realizes')
Ardoq get all components

Get All Technical Capabilities Which Are Realized By an Application

g.V().  hasLabel('Application').  out('Realizes').  hasLabel('Technical Capability')
Ardoq technical capabilities realised by an application

Get All Applications That Realize a Technical Capability

g.V().  hasLabel('Application').  filter(    out('Realizes').    hasLabel('Technical Capability'))
Ardoq all applications that realize a technical capability

For users who mainly deal with graphical interfaces, getting started with writing graph queries can seem a bit intimidating. The language might take some time to get used to, but it is relatively easy to learn the basics. If your data is modeled well, you may find that Gremlin lets you extract insights that your architecture was never designed to provide.

Using Gremlin in Ardoq

To use Gremlin in Ardoq, your plan needs to include the Analytics module, this will give you access to an in-app Gremlin editor.

Ardoq’s Analytics and Reporting module does, among other things, give you access to an in-app query editor where you can run Gremlin queries.


In many cases, you can get pretty far with Ardoq’s graphical advanced search builder. However, Gremlin brings a couple of powerful, additional search capabilities to the table.

First, it lets you traverse the graph to access, and retrieve values from directly or indirectly connected components and references.

Second, it lets you filter not only on properties but on results from traversals as well. It is also very flexible with regards to how to format your results in columns, making it easy to produce reports which can be exported to an Excel sheet.

Get a table with one column showing all departments, and another showing the total annual cost of all applications used by each department

g.V().  hasLabel('Department').  project('Department name', 'Annual cost of applications').    by('name').    by(      out('Uses').      hasLabel('Application').      values('annual_cost').      sum())

Ardoq’s graph servers support a Groovy-implementation of Gremlin, which means that queries can include Groovy code. Groovy is a programming language that is very reminiscent of Java, but with a more forgiving syntax. This can come in handy if a query is difficult to write in idiomatic Gremlin.

Show all components which have not been updated within the last year

oneYearAgo = (new Date() - 365).format('YYYY-MM-dd')g.V().  filter(has('last-updated', lt(oneYearAgo))).  order().by('last-updated')

Graph Queries Remove the Need for Redundant Data

As you become familiar with extracting data from the graph, you build a better understanding of how different ways of modeling data affect your ability to extract insights from your graph. In particular, you will get an increased understanding of which type of data should be modeled as properties on a component, and which types of data should be retrieved through properties of interconnected components.

If we for instance want to model the application landscape for a number of different departments, it might be tempting to add the total annual cost of a department’s applications to each department component. However, the total cost of applications is not a property that directly describes a department. Rather, it’s a value derived from the annual cost of applications used by the department. We should rather store annual cost as a property on each application component.

Get the total annual cost of all applications used by the marketing department

g.V().  has('Department', 'name', 'Marketing').  out('Uses').  hasLabel('Application').  values('annual_cost').  sum()


Since derived values can be extracted through graph queries with relative ease, we can base our analytics on a single source of truth, and avoid storing redundant values, and by doing so, we maintain a clearer separation between the modeling process and the analytics process.

We acknowledge that in many cases, it can be convenient to store duplicate data on components that do not “own” the data. It can make it easier to get an overview of critical values for a specific component or make them more accessible when using a filter-based search.

To accommodate this in a way that does not degrade the quality of your architecture, Ardoq supports Gremlin-based calculated fields. This means that your components and references can have properties derived from their relationships with other components and references, while still avoiding storing redundant values which if not maintained properly, can evolve to become inconsistent.

How Graph Queries Can Be Used to Improve Your Enterprise Architecture Workflow

Once you are up and running with Graph queries, you will find that Ardoq lets you use graph queries for far more than extracting values.

Generating Excel Reports

In the reporting module, when running a graph query, the results are always presented in a table format, and can easily be exported to an Excel sheet. The Gremlin query language is very flexible with regards to how results should be presented, making it easy to define columns and even specify individual graph queries for each column.

Ardoq generate Excel reports

Building Dashboard Widgets

Ardoq dashboards enable you to create widgets based on the results from graph queries. Once we have a graph query, we can easily add it to a dashboard, directly from the Gremlin query editor. A dashboard of widgets based on graph queries can be a great way to give an overview of your important values. Graph queries used in dashboard widgets are snapshotted every day, making it possible to visualize trends in your critical values. Dashboards can be added to presentations, which can be shared externally, and remain up to date as your data evolves.

Governance

Data governance is concerned with the quality of the process of adding and maintaining data. This entails controlling that new data is added correctly, or that old data is still correct. To aid in your governance process, graph queries can be used to give an overview of all data which has been added within the last week, or which has not been updated within the last year, just to name a few examples. This can also be done with our graphical advanced search feature.

Graph queries can, however, take your governance process to the next level. When verifying whether old data is still correct, it is important to know who to ask to verify it. In the case of application components, for instance, the governance query can be written to not only return application components that have not been updated within the last year, but also the emails of all user components which are modeled with a reference to the application components.

Monitoring Data quality

One of Ardoq’s great advantages is flexibility. We acknowledge that requirements can change over time and that it is not always possible to know how we want to model an architecture one, or maybe five years from now. When starting with Ardoq, enterprises are not forced to decide upon a data model once and for all. Instead, Ardoq offers the ability to have flexible data models, making it easy to expand the architecture as it evolves.

When extracting value from the data in Ardoq, it is a great advantage if the data is stored in a consistent way. In other words, similar data should be modeled similarly. It can be a good idea to agree upon a data model within your enterprise. A data model simply describes how each component type should connect to other component types.

Given a set of all legal combinations of source component type, reference type, and target component type for a data model, we can write graph queries to identify and make a report of all the components and references which do not adhere to the data model, which in turn makes it possible to continuously monitor data quality, and track how it evolves over time.

Calculated Fields

Ardoq supports gremlin-based calculated fields. This means that your components and references can have properties derived from their relationships with other components and references, while still avoiding storing redundant values, which, if not maintained properly can evolve to become inconsistent.

Learn more calculated fields and how to leverage them to visualize more complex properties of your data.

Graph Filters

Graph filters make it possible to filter away all the components and references in a visualization that are not part of a given Gremlin query path. This can come in handy when you want to visualize the results from a Gremlin query, when dealing with huge data sets, or when you have very specific filtering requirements. Your visualizations can in turn be added to presentations, which will continue to be updated along with your data.

You can use graph filters for better analysis, visualizations, and communication.

Next steps

Storing our data as a graph makes it possible to model your architecture in a way that closely resembles the reality it models. We strive to maintain a single source of truth and find derived values through the powerful Gremlin query language which does not force you to make compromises when modeling your data. Embracing graph queries will unlock a lot of possibilities in Ardoq.

Familiarize yourself with your data and how it is modeled. Use the metamodel feature to understand how your data is interconnected and use that understanding to start experimenting with Gremlin graph queries.

Our dedicated support team is eager to help you with modeling best practices, writing queries, and troubleshooting. Reach out by starting a chat in the navigation menu on the left-hand side once you are in the app.

Did this answer your question?