Skip to main content
Getting started with Data Lineage

As compliance and governance directives become stricter having knowledge about how data is used within the business is essential.

J
Written by Jim Down
Updated this week

Introduction

In today’s world, we are surrounded by data. From energy consumption, personnel records, budgets, invoices, and so on. Data is everywhere.

Understanding data, categorizing it, using it, analyzing it, tracking it, and deriving conclusions from it are therefore critical for the success of any organization. Using data lineage, you can make the logical and physical location of your business data transparent, enabling you to make effective business decisions based on trustworthy data.

Using the preconfigured bundle and the necessary prerequisites, we will show you how to get started with data lineage in Ardoq.

Table of Contents

What is Data Lineage

Data lineage is a map of the journey data takes through your company. Typically it starts with the data’s origin, how it is processed and transformed within the business, including an explanation of how and why the data has moved over time, through to its delivery and storage at an endpoint or end-user.

Typically data lineage can be split into two distinct types:

  1. Solution / Data Warehouse data lineage, comprising:

    1. specific data origin

    2. transformational steps/processing

    3. the logic that decides data transformation steps

  2. Enterprise data lineage, comprising:

    1. where data resides

    2. which apps read or write data

    3. where in the world the data is physically located

    4. how the information is used within the business

Solution data lineage is, therefore, more transactional/process related whereas Enterprise data lineage focuses on the high-level bigger picture. This Use Case Guide will focus on Enterprise data lineage.

Why is Data Lineage Important for Your Organization

Knowing your critical business data’s process point and location is paramount for two main areas:

  1. Compliance management

    • Ensuring compliance with existing and future regulatory and legal directives

    • Enabling improved risk and compliance management

    • Providing the required data storage and processing

    • Showing the physical storage location of data, worldwide

  2. Risk management

    • Ensuring that critical data comes from a reliable source

Data lineage aids both short and long-term management, showing businesses where data is logically and physically processed, where the business data is used, and who is responsible for stewarding the data entity.

How to Get Started With a Data Lineage Initiative

1. Defining Your Problem and Success Criteria

Start your data lineage initiative by defining the business problems that you want to solve.

Common problems include:

  • Which applications write information to the entity, and which read and use the information?

  • Who is responsible for the data entity?

  • What is the confidentiality of the data entity?

  • Which infrastructure hosts the data entity?

  • What is the location where your data entity is physically stored?

Ardoq comes with a built-in data lineage metamodel based on short time-to-value results from various customers and industries. However, this metamodel can be adapted to your organization’s unique goals and objectives, using Ardoq’s metamodel as a template. Simply modify it to fit your company, adding any other specific problems you’re going to solve and removing those that don’t fit your industry or situation.

Once you know what your problems are you then need to determine how and when you can solve them. We suggest breaking your data landscape into pieces and tackling the most important parts first rather than trying to solve all problems at once.

Equally as important as defining your problems you need to define what your success criteria are for these problems.

In Ardoq we consider three key areas when identifying success metrics:

  • Data Quality & Completeness - have more accurate and complete data which is necessary for business decision making

  • Business Value - measure realization of the market and operational performance objectives

  • Productivity - see improvements from the Enterprise Architecture operating more efficiently

Initially, metrics do not have to be comprehensive, or exact, but they must be monitored and refined over time.

2. Data Governance

As outlined above one of the key issues addressed by data lineage is that of supporting corporate and legal compliance.

To support any compliance requirements it is essential that data is always up to date. Ardoq can automate the enforcement of many governance rules to:

  • All Data Entity associations from applications should be updated annually to ensure correct application risk assessment

  • Maintaining up-to-date metadata on all data entities will minimize the risk of documenting faulty confidentiality

3. Get Your Data into Ardoq

Having defined your problems, success criteria, and metrics, you can begin entering data into Ardoq.

Using our predefined metamodel the data needed for data lineage includes component types (Data Entity, Confidentiality, Application, Business Capability, and People) and reference types (Accesses and Owns).

This data can be imported in several ways including:

  1. Excel importer - use to import/update large data sets easily

  2. XML - Data entities and models are usually documented with modeling tools. Typically the tools’ export file format is unstructured data, but the XML importer will assist with importation.

  3. Azure AD - use Ardoq’s Azure AD integration to sync Active Directory with Ardoq

  4. Custom Integrations - REST API and wrappers can be used for selected programming languages. REST API can automate documentation or create custom tools that use data from Ardoq.

Take advantage of our Excel import templates to streamline data entry and speed up the process:

4. Visualizing Data

Having input your data, you need to determine the best way to visualize the data lineage information. Several different formats are available, so use those which are most appropriate for your business. The formats include:

Pages View

See all of the Components or References information textually as you would in document editing software or wiki. In addition, it shows the fields and values you’ve defined for those components or references.

Table View

The data catalog is presented in a tabular format that includes both metadata and references.

Block Diagram View

The block view format allows you to quickly see and understand the adjacent and extended context for any component.

Dependency Map View

This visual allows you to easily see relationships or hierarchy in a compact nested visual. This is ideal for showing reference to data entities in a structured and clear way.

5. Assessing Data Completeness

Once you have collected and visualized your data it is essential to determine how complete the data set is as having high data quality is critical for delivering insights.

Visualizing your data can highlight missing, inconsistent, or incorrect information. However, a more robust way to check for data completeness is to create queries that aggregate and summarize the quality of data sets. The results can be plugged into a dashboard or used to leverage the prebuilt Ardoq Data Lineage Data Quality Dashboard.

To improve the quality of the collected data you should collaborate with colleagues.

Collaboration can be undertaken manually, in an ad-hoc manner, or through the use of Surveys to collect missing information from data architects, data entity owners, or application owners. Once identified, updates and changes should be made to the data either directly or via the Excel Importer.

6. Analyzing Results

Once you have data of sufficient quality you can analyze it using custom or pre-built views and presentations. These will surface insights that can be shared with your stakeholders.

For data lineage, use the preconfigured Discover viewpoints for stakeholders to visualize a data entity, the applications using the data, on which infrastructure it exists, and its physical storage location. Additionally, use the preconfigured presentation to gain a complete overview of where business data is used.

7. Continuous Update

Maintaining the status and quality of data fields is essential for legal and corporate compliance, as well as for minimizing business risk during decision-making. Stakeholders receive continued value when high data quality is maintained.

To ensure data integrity broadcasts can be used to trigger workflows to automatically send out alerts and surveys. This helps establish confidence and credibility with the architecture team and with the insights that are delivered.

Ardoq includes pre-configured broadcasts that are set to collect this data at a 12-month interval. However, some organizations are fast-paced and have a lot of change. If that’s the case, you might want to set the interval to 6 months. Ultimately the frequency of alerts and surveys is something to feel out and identify the right rhythm for your organization.

Summary

Organizations today are faced with a wealth of data, which they must be able to understand, categorize, utilize, analyze, track, and draw critical conclusions.

Regulatory, legal, and corporate requirements ensure that all organizations must comply with procedures and processes for the control, management, and validation of data.

Data lineage and associated techniques are used to track the data journey through your organization, from its origin and entry into the business, how and where it is processed, transformed, and moved over time, through to its delivery and storage at an endpoint or end-user.

This Getting Started guide provides an overview of the necessary steps that are required to support this process including defining the companies problems, resolution success criteria and appropriate metrics, data collection and input, its visualization, a completeness assessment, improving its quality, and finishing with the analysis of the results and generated insights.

We finally emphasized the importance of maintaining the quality and up-to-dateness of the data through continuous updating and the use of broadcasts.

We hope that this guide has given an insight into data lineage, its importance to the organization for compliance and risk management, and how to go about starting your data lineage journey.

For a more detailed understanding of data lineage within the Ardoq metamodel see our Data Lineage Metamodel guide.

Did this answer your question?