When writing Gremlin queries, you may find yourself in a situation where you need to interpret a non-numeric value as a number. In general, if you want to perform numeric calculations based on a field's value, we recommend storing the field as a number field. However, there may be cases where storing such fields as text can be convenient:
You have stored numeric values in a list field (list field values in Ardoq are stored as text strings), so as to only make it possible to choose among a predefined set of values.
The values are not inherently numeric, but it may make sense to assign numeric values to them during analysis. Consider a field called Hosting Type that may have the values "On Premise", "Cloud", "SaaS" or "3rd. Party". In this case it makes sense to store the values as text, but if you were to calculate an application's rating based on its hosting type it is useful to be able to assign a number to each value.
Example: Parse integers stored as text
Assume that we have a field called Technical Integrity, whose values are integers that, for whatever reason, are stored as text. We want to find the average technical integrity for all of our applications, but we get an error if we run the following:
g.V(). hasLabel('Application'). values('technical_integrity'). mean()
We get this error because the Technical Integrity values are not integers, but text representations of integers, so it is not possible to calculate their average.
To make this work, we can create a function that converts strings to integers, and then use this to map our text values to numeric values:
def toInteger = { it.get().toInteger() }g.V(). hasLabel('Application'). values('technical_integrity'). map(toInteger). mean()
Example: Parse non-integer numbers stored as text
Our approach of converting string representations of numbers to integers only works if the numbers are indeed integers, e.g. "3"
. If the number is not an integer, e.g. "3.14"
, we will get an error:
To avoid this, we can write a more robust function that converts string representations of numbers to floating-point numbers instead of integers. This way, we will be able to handle both integers and non-integers:
def toFloat = { it.get().toFloat() }g.V(). values('technical_integrity'). map(toFloat). mean()
Example: Handle cases where not all the values can be interpreted as numbers
If we try to interpret a text value that is not a number, e.g. "High"
as a floating-point number we will get an error:
There are two ways to handle this:
We can provide a fallback value for cases where the value cannot be converted to numbers.
We can filter away those values that cannot be converted to numbers before converting them to numbers.
Example: Providing a fallback value for values that cannot be converted to numbers
In the below example we have modified the toFloat
closure to convert the text value to a number if possible, and return 0
otherwise.
def toFloat = { try { it.get().toFloat() } catch (NumberFormatException e) { return 0 }}g.V(). values('technical_integrity'). map(toFloat). mean()
Example: Filtering away values that cannot be converted to numbers
In the below example we have kept our toFloat
function as it was, but introduced a separate function canBeConvertedToFloat
which returns true
if the value can be converted to a number and false
otherwise. We provide this function to a filter()
-step before the map()
-step where we convert the values to numbers.
def canBeConvertedToFloat = { try { it.get().toFloat() true } catch (NumberFormatException e) { false }}def toFloat = { it.get().toFloat() }g.V(). values('technical_integrity'). filter(canBeConvertedToFloat). map(toFloat). mean()
Example: Assigning values to non-numerical fields
Assume that applications have a field called Hosting Type, which may have the values "Saas"
, "3rd. Party"
, "Cloud"
or "On Premise"
. Assume that we eventually want to only use SaaS applications and want to write a query that gives our application portfolio a score from 1 to 5 based on the hosting type of the applications, that can be tracked over time. We want to achieve this by assigning the value 5
to "SaaS"
, 4
to "3rd. Party"
, 3
to "Cloud"
, 2
to "On Premise"
and 1
to any other value, and then take the average of this number for all of our applications.
If Hosting Type was already a number field we could calculate the average application score with the following query:
β
g.V(). hasLabel('Application'). values('hosting_type'). mean()
In our case Hosting Type is not a number field, so running the above query would give us an error:
The following example shows how we can map each possible value of Hosting Type to a number:
g.V(). hasLabel('Application'). has('hosting_type'). choose(values('hosting_type')). option('SaaS', constant(5)). option('3rd. Party', constant(4)). option('Cloud', constant(3)). option('On Premise', constant(2)). option(none, constant(1)). mean()
Notice how we use has('hosting_type')
to check that the Hosting Type field exists on all the components that are passed to the choose()
-step which maps the Hosting Type values to numbers. The choose()
-step requires that the number of values returned from the choose()
-step is equal to the number of values we have before the choose()
-step. The values()
-step is only able to extract the Hosting Type value from those components that have it, so if some of our applications do not have a Hosting Type field the choose()
-step would not be able to map all of the values it is passed to another value.
π¬ We're always happy to help. Feel free to reach out to us via the in-app chat if you have any follow-up questions about this article.