MD101 Glossary

Smart Data Caching

Smart Data Caching stores frequently used or expensive-to-compute data in a way that accelerates access without redundant processing. It ensures a fast user experience while managing resource overhead.

Snowflake Schema

A Snowflake Schema is a way of structuring data in a relational database where dimension tables are normalised into multiple related tables. It reduces data redundancy and improves consistency, and is especially useful in large, complex analytical systems with shared dimensions and hierarchies.

Source of Truth

A Source of Truth is the single place where the most accurate and up-to-date version of a dataset is maintained. It gives teams one reliable reference point for trusted data, often delivered as a well-defined data product. This helps everyone stay aligned and confident in the numbers they use.

Special Edition

Speed-to-Value

Tag-Based Access Control

Tag-Based Access Control uses metadata tags assigned to data assets or users to manage permissions dynamically. It simplifies security management by applying policies based on attributes rather than hardcoded roles.

Third-Party Data Integration

Third-Party Data Integration involves bringing in external datasets like vendor feeds, public APIs, or partner data into your internal ecosystem.

Time-Series Database

A Time-Series DB is designed to handle data indexed by time, making it ideal for storing and querying logs, metrics, and event data. It supports high-write throughput, fast retrieval, and efficient compression.

Tokenisation

Tokenisation replaces sensitive data with non-sensitive placeholders (tokens), allowing systems to store and process information securely.

Usage-Based Billing

Usage-Based Billing is a pricing model where users are charged based on actual consumption of resources like compute, storage, or API calls. It promotes transparency, cost efficiency, and flexibility, especially in scaling data platforms across variable workloads.

Use Cases

Versioned Datasets

Versioned datasets track changes to data over time by storing snapshots of different states. This allows teams to reproduce past results, compare versions, and roll back when needed.

Virtual Data Lake

A Virtual Data Lake allows users to access and query data across multiple sources without physically moving it into a central repository. It enables unified data access while preserving source system ownership.

Be a Data Guru.Join The Modern Data Class!

Be a Data Guru.
Join The Modern Data Class!