All Collections
Data Analysis
Gremlin
Follow our tips and tricks for how to group results in Gremlin, for both basic and advanced grouping. Written by Kristine Marhilevica
Updated over a week ago

# "Grouping" With the Project-step

The most common way of "grouping" results is to use the project-step, which is commonly used in calculated fields. For example, to show all business capabilities that are realized by an application, we can use "project" like this:

`g.V().hasLabel('Application').project('Application', 'Business Capabilities').by('name').by(both('Is Realized By').values('name').fold())`

Try not using `project` to notice the difference; applications will show up on multiple rows instead of just once (each row represents an `application->business capability`-relationship):

`g.V().hasLabel('Application').as('Application').both('Is Realized By').as('Business Capability').select('Application', 'Business Capability').by('name')`

# Simple Grouping Using the Group-Step

There are more advanced ways of grouping that can be realized using the group-step! Let's start with a simple example, where we group applications by their criticality:

`g.V().hasLabel('Application').has('criticality').group().by('criticality').by('name').unfold().project('Field value', 'Applications').by(select(keys)).by(select(values))`

The pattern seen in the example above will repeat itself; grouping will create an "object" consisting of keys and values, which we usually want "unfold" into separate rows and then show the key and value in separate columns:

`...unfold().project('Grouping', 'Values').by(select(keys)).by(select(values))`

For example, what if we wanted to see an application count per level of criticality?

`g.V().hasLabel('Application').has('criticality').groupCount().by('criticality').unfold().project('Criticality', 'Number of applications').by(select(keys)).by(select(values))`

# Grouping Using Combined Keys

## Grouping transactions between bank accounts

More advanced grouping can be done by creating "combined keys". In this example, we've modeled user accounts and transactions between them. Notice that Odd has two transactions to Ada.

If we write a simple query to list the transactions, we will see both transactions listed as expected:

`g.V().hasLabel('Account').as('From').outE().as('Amount').otherV().as('To').select('From', 'To', 'Amount').by('name').by('name').by('amount')`

If we want to group the transactions, we need to create a grouping that corresponds to unique from/to-relationships. This query creates a grouping key consisting of the sender and recipient:

`g.V().hasLabel('Account').as('a1').outE().as('tx').otherV().as('a2').group().by(select('a1', 'a2').by('name')).by(select('tx').values('amount').sum()).unfold().project('From', 'To', 'Amount').by(select(keys).select('a1')).by(select(keys).select('a2')).by(select(values))`

Notice that when we create combined keys, we need to select the individual keys to print the account names (`select(keys).select('KEY')`)

## Grouping applications by top-level business capability and lifecycle phase

The following query finds all business capabilities and their connected applications, and then applies a grouping on the application's lifecycle phase and the "top-level business capability" that the application realizes.

`g.V().hasLabel('Business Capability').as('bc').both().hasLabel('Application').has('lifecycle_phase').as('lp').group().by(  select('bc', 'lp').  by(    until(__.not(out('ardoq_parent'))).    repeat(out('ardoq_parent')).    values('name')  ).  by('lifecycle_phase')).by('name').unfold().project('Business Capability', 'Lifecycle Phase', 'Applications').by(select(keys).select('bc')).by(select(keys).select('lp')).by(select(values).unfold().dedup().fold())`

Still have questions? Feel free to reach out to us. We're happy to help!