New Release
Learn More
Your submission has been received!
Thank you for submitting!
Thank you for submitting!
Download your PDF
Oops! Something went wrong while submitting the form.
Table of Contents
Get weekly insights on modern data delivered to your inbox, straight from our hand-picked curations!
This article is part two of my article on ‘Managing the evolving landscape of data products.’ In this part, we demonstrate the challenges in versioning data products, identify inflection points, and offer insights to help you improve DP design and management for a robust and scalable Data Mesh.
For data-driven innovation, adeptly managing data product versioning, efficient cataloguing, and strategic sunsetting are crucial.
Data product versioning presents challenges that organisations must address to effectively manage their data mesh architecture. These challenges include :
It is essential to move from one state of a data product to the next as the requirements change, allowing for adaptability and meeting evolving needs within the data mesh ecosystem.
Types of DP Versioning:
For (Snowflake) tables, versioning thrives on Snowflake’s inherent tools: Time Travel and Cloning.
In Kafka streams, DP output ports are versioned through topic versions and evolving message schemas. Each product’s output stream version corresponds to a dedicated Kafka topic, differentiated and managed by assigning version numbers/labels to them.
Message schema evolution is crucial in versioning Kafka streams. As the data product evolves and changes its output schema, it is vital to ensure backward compatibility to avoid breaking downstream consumers. Schema evolution techniques such as Avro or Protobuf that support schema evolution and compatibility rules can help.
The Kafka topic and message schema can be updated when a new DP version is released. Consumers can then choose which version matches their compatibility and requirement.
Organizations should ensure that each version of the data product is catalogued separately. As a cataloguing task, each data product version should clearly state the data sets, key attributes viz consolidation levels and business keys, the definition of DP, as well as data set elements/components, which can go as deep as highlighting the version of individual columns. The idea is to clearly state the structure and objective of each version. Each version should also define its data lineage as well. Furthermore, across data product versions, the changes should also be highlighted well about what has changed between versions so that consumers can make an informed decision.
Leverage cataloguing to enable ease of discovering data products across data mesh. This helps users identify different data products and understand the differences between various versions. The catalogue should include information about the data product’s lifecycle, including version details, changes, support duration, compatibility, and more. Include details about how to access the data and information about the datasets. Also, provide updates on Service Level Objectives (SLO) and Service Level Indicators (SLI) for each version. This approach makes it easier for users to find and use the data products effectively in a self service way.
By cataloguing datasets as distinct DP versions in one community, organisations can highlight SLI/SLO metrics per dataset. Version-level access grants offer fine-grained control over data access and permissions.
This enables:
Organisations can define and enforce version-level data quality standards, data lineage, and other relevant information for consistency and accuracy.
Word of caution: Since Data Product cataloguing is still quite an evolving space with limited out of the box industry solutions in the context of data mesh. Organisations might need to design some custom capabilities to optimize the data cataloging across versions for easy discovery and unambiguity.
Cataloguing tools like Collibra help manage, update, and govern data products and their versions, track their lifecycle, set deadlines for outdated or retired versions, and ensure smooth, transparent migration.
📝 Note from Editor: Learn more about catalogs and their roles in data product space here: How to Build Data Products? Deploy: Part 3/4
1. Inflection point: The boundary where a DP is no longer a version change but a new data product.
2. How to effectively manage the sundowning process and minimise disruption to downstream consumers.
Distinguishing between DP version change and new product emergence is crucial to Data Mesh management. It is essential to identify where the description of the data product needs significant changes to align with the dataset and use case it supports. This indicates the birth of a new data product, while the previous version can be archived/retired.
By effectively recognising these inflection points, organisations can ensure proper governance and evolution of their Data Mesh.
With vigilant monitoring, organisations can seize opportunities for new DP creation. This ensures relevance and consumer alignment and propels data-driven objectives.
The company’s domain team launched an internal data product called “Customer Behaviour Analytics” that analyses customer behaviour/preferences on the website. This DP caters to marketing and product development teams.
As the company grows and launches a mobile app, marketing seeks focused mobile behaviour insights, while product development demands finer-grained data on specific app features.
The DP “Customer Behaviour Analytics” needs significant changes to meet the new requirements. Hence, “Mobile App Customer Insights” dedicated to scrutinising app interactions and addressing specific requirements should be created.
By recognising this inflection point, the team can split the original data product into two data products tailored for unique needs.
Data products are the bedrock of a Data Mesh, functioning as cohesive units. Steps and considerations for their versioning, cataloguing, and decommissioning may vary depending on the organisation and nature of the old DP.