MD101 Glossary

Data Usage Analytics

Data usage analytics tracks how data is accessed, used, and shared across your product, providing visibility into what’s valuable, what’s ignored, and how data drives outcomes. This helps teams prioritize improvements, refine data experiences, and make smarter roadmap decisions: by surfacing patterns like most-used datasets, common queries, or drop-offs in data workflows.

Data Vault

Data Vault is a data modeling technique built for flexibility, scalability, and historical tracking in large, evolving data environments. It organises data into Hubs (core business entities), Links (relationships), and Satellites (context and history), making it easier to adapt to change without breaking downstream systems. This structure supports reliable analytics, easier auditing, and long-term maintainability, especially in enterprise-scale data platforms.

Data Virtualisation

Data Virtualisation is a layered technology that enables users to access, query, and use data across multiple systems without physically moving or duplicating it. A data virtualisation layer is a unified virtual layer over various data sources enabling faster data delivery, simplified integration, and minimal duplication. This allows teams to work with up-to-date data in real-time without waiting for pipelines or managing complex ETL processes.

Data as a Product

Data as a Product treats data as a valuable, customer-focused offering: designed, developed, and maintained to meet specific user needs. This approach ensures that data is reliable, discoverable, and easy to use, with clear ownership, continuous improvements, and a focus on delivering measurable value to the business or end-users.

Decentralised Data Governance

Decentralised Data Governance shifts control and accountability from a central team to the domain teams closest to the data who are the end users of data or data consumers and understand the day-to-day governance protocols of the domain data better. Instead of one-size-fits-all rules, each team defines and enforces policies that work for their context, while still aligning with shared standards. It scales governance without becoming a bottleneck.

Decision Intelligence

Decision intelligence refers to how a product helps users make smarter, faster choices: by combining data, AI, and business context into clear, guided recommendations. This involves building decision flows, what-if scenarios, and intuitive visualisations that turn complex data into confident, low-friction actions for the end user.

Demand Forecasting Model

A demand forecasting model is a feature or engine in your product that predicts future customer or resource needs, so users can plan smarter and act early. It looks at surfacing predictions in the right place, at the right moment, with the right level of explainability, such that users can trust and act on the insight without digging into the math.

Digital Twin

Distributed Data Processing

Distributed Data Processing involves spreading data tasks across multiple systems to handle large volumes efficiently. By breaking down processing into smaller, parallel tasks, it ensures faster, more scalable data workflows. For data product teams, this means delivering data insights faster without being bottlenecked by single system limitations.

Domain-Oriented Data Ownership

Data Strategy

Domain-Oriented Data Ownership gives responsibility for data to the teams who create and use it, empowering them to govern, maintain, and improve the quality of their own data. Such distributed ownership ensures data is handled by teams with the most context (closest to data usage), leading to more accurate, reliable, and user-centric insights, while maintaining alignment with broader organisational standards.

ELT (Extract, Load, Transform)

ELT (Extract, Load, Transform) is the process that pulls in raw data from multiple sources, loads it into storage first, and then applies necessary transformations to make it usable for downstream analysis. ELT in the data product realm (unlike traditional ETL) focuses on efficiency by transforming data only when needed after it's loaded into the system, reducing unnecessary data movement. ELT ensures that data remains clean, accurate, and ready for use in real-time.

ETL (Extract, Transform, Load)

ETL (Extract, Transform, Load) is the process your data product employs to pull in only the raw data demanded by downstream use cases, clean and reformat it, and load it into storage, so it is ready for use. ETL within the bounds of Data Products is more efficient given that it doesn’t necessitate migration of all data, but only the source data that directly maps to the end-use case. ETL with data products is configured to play out reliably and at scale in the background, so users always work with clean, trustworthy data.

Edge AI

Edge Analytics

Edge analytics is when your data product analyses data close to where it is created, like on a device or sensor, so users get faster, real-time insights without relying on the cloud. This powers responsive features with low latency, even in limited-connectivity environments.

Embedded Analytics

Embedded analytics is when insights, dashboards, or visualisations are built right into the data product experience, not tacked on as a separate tool. It helps users understand what’s happening as they work, without switching context, so data feels like a natural, seamless part of their workflow.

End-to-End Data Encryption

End-to-end encryption ensures data is protected at every step, from the moment it enters your product to when it's stored or shared. This encryption method guarantees users that their data is secure by default, and not something they have to configure or worry about.

Event Streaming

Event streaming is how your data product continuously processes real-time data as it flows in, so insights, alerts, or actions happen instantly. This empowers use cases like live dashboards, fraud detection, or logistics tracking, keeping experiences up-to-the-moment.

Event-Driven Architecture

Event-driven architecture allows your data product react in real time to events like a new user sign-up or a device alert by triggering automatic responses. Such architectures enable fast, scalable experiences that feel alive, responsive, and efficient without polling or delays.

Feature Engineering

Feature engineering is the process of shaping raw data into meaningful inputs that your data product can leverage to power predictions, decisions, or personalisations. This involves designing features that make machine learning models more accurate and explainable, so product outcomes align closely with user needs.

Feature Store

A feature store refers to a central hub in your data product where engineered features are stored, versioned, and reused across ML workflows. It streamlines model development by making high-quality, production-ready features accessible so teams don’t waste time reinventing or re-validating data pipelines.

Federated Data Access

Federated data access refers to how your data product can connect to data across multiple sources without having to move or duplicate it. Thus, instead of relying on a centralised storage, it queries the data where it lives to ensure faster and stable access with improved security and minimal movement, eliminating complexities of integration-heavy models.

FinOps for Data

FinOps refers to the practice of tracking, managing, and optimising the cost of data infrastructure, so your data product runs efficiently without blowing the budget. These processes enable usage-based insights and accountability, helping teams balance performance with cost and make informed decisions about data architecture.

First-Party Data

First-party data is the information your data product collects directly from users through interactions, behaviours, or transactions. It’s considered the most reliable and privacy-compliant data, powering features like personalisation and customer analytics, without the dependency on third-party sources.

Fully Managed Data Service

A Fully Managed Data Service handles the infrastructure, scaling, maintenance, and reliability of a data capability so teams focus on using the data rather than operating it. It abstracts away setup, updates, monitoring, and support, offering a ready-to-use service with guaranteed performance, security, and availability built in. The goal is to reduce operational overhead while accelerating time to value.

You’re in 🥳.
We’ll drop fresh insights, updates, and ideas in your inbox soon.

Oops! Something went wrong while submitting the form.

Be a Data Guru.Join The Modern Data Class!

Be a Data Guru.
Join The Modern Data Class!