Gremlin Training 17: Presenting data in tables - project()

When learning about the select()-step, we experienced that the number of rows in the table did not necessarily correspond to the number of Person components, since there would be one row for each permutation of Person, Application and Business Capability and no rows for Person components which for which there were no Applications or no Business Capabilities. The project()-step works quite differently and avoids all of these pitfalls.

Let's again consider the use-case where we want to create a a table which presents how people are experts in applications and how those applications in turn realize business capabilities.

The project()-step enables us to first specify a selection of elements, and then provide anonymous traversals to specify how those elements should map to columns. As such, the number of rows will be equal to the amount of elements we have in our current selection before using the project-step, and it is our responsibility to provide one query for each column, each of which should map each element in the initial selection to exactly one value. It is still perfectly possible to display multiple values or no values in a single cell, by presenting them as a single string, or as an array of strings, for instance.

Unlike the select()-step where the columns of the result table adapted the labels assigned to the selected steps using the as()-steps, the project()-step takes the column names as arguments. When using the select()-step, we used one by()-step for each column to specify to specify what properties to display in each column. When using the project()-step, we also use one by()-step per column, but rather than just providing property names, we will provide traversals to access the values we need.

The following query finds all components, removes everything which is not of type "Person", uses the project()-step to define the columns "Person", "Applications" and "Business Capabilities", specifies that the "Person" column should be presented by the "name" property of the components in the current selection, specifies that the "Applications" column should be represented by the result of following outgoing references of type "Is Expert In", filtering away everything which is not of type "Application", accessing the name property, and using the fold()-step to return the results in a single list; specifies that the "Business Capabilities" column should be represented by the result of following outgoing references of type "Is Expert In", filtering away everything which is not of type "Application", following all incoming references of type "Is Realized By", filtering away everything which is not of type "Business Capability", accessing the "name" property and using the fold()-step to return the results in a single list.

g.V().
  hasLabel('Person').
  project('Person', 'Applications', 'Business Capabilities').
    by('name').
    by(
      out('Is Expert In').
      hasLabel('Application').
      values('name').fold()).
    by(
      out('Is Expert In').
      hasLabel('Application').
      in('Is Realized By').
      hasLabel('Business Capability').
      values('name').
      fold())

In the resulting table, there is exactly one row per Person component. Some of the Application and Business Capability cells are empty, and some contain multiple values. You may notice that in the cells that contain multiple values, the values are displayed very closely together. This is how list values are presented. To present multiple values as a string with each value nicely spaced apart, we can replace

.fold()

with

.fold().map{ it.get().join(', ') }

The resulting query will then be like this:

g.V().
  hasLabel('Person').
  project('Person', 'Applications', 'Business Capabilities').
    by('name').
    by(
      out('Is Expert In').
      hasLabel('Application').
      values('name').
      fold().map{ it.get().join(', ') }).
    by(
      out('Is Expert In').
      hasLabel('Application').
      in('Is Realized By').
      hasLabel('Business Capability').
      values('name').
      fold().map{ it.get().join(', ') })

From the screenshot above, it might seem like this query delivers different results, but this is just because the results are sorted based on the values in the Application column, which are now text strings instead of lists of text strings. The first result is not the same as in the previous example. It just happens to have a name which is similar.

Note that even though we did all traversal steps before the select()-step when we learnt about the select()-step, and all the traversal steps in the by()-steps now that we learnt about the project()-step, it is perfectly possible to provide anonymous traversals in the by()-steps when using a select()-step. In fact, the query we just looked at which uses the project()-step can be written using a select()-steps as well, like this:

g.V().
  hasLabel('Person').
    as('Person').
    as('Applications').
    as('Business Capabilities').
  select('Person', 'Applications', 'Business Capabilities').
    by('name').
    by(
      out('Is Expert In').
      hasLabel('Application').
      values('name').fold().map{ it.get().join(', ') }).
    by(
      out('Is Expert In').
      hasLabel('Application').
      in('Is Realized By').
      hasLabel('Business Capability').
      fold().map{ it.get().join(', ') })

Gremlin troubleshooting

Gremlin Tricks: Replicate a Table View

Gremlin Graph Search Examples

Gremlin Training 18: Presenting data in tables - select()

Gremlin Training 7: Basic filtering steps - filter()