Medallion Architecture

A layered data design pattern that progressively refines raw data into clean, business-ready insights — moving through Bronze, Silver, and Gold layers.

Design Principles

Medallion architecture is built on a set of practical engineering principles that reflect real-world data team needs:

  • Decoupling — Separating data into layers maintains flexibility. Changes in a source system don’t cascade into your reporting layer.
  • Operational reporting & analytics — Application teams often need near-real-time insights from operational data, which can in turn be used to improve the operational systems themselves.
  • Historical retention — Teams need to keep data for years — for trend analysis, quality improvement, and the ability to reload or reprocess when pipelines change.
  • Integration — Business questions rarely live within a single system. Combining data from multiple sources into integrated perspectives is a core reason the architecture exists.
  • Duplication for performance — Different teams have different read patterns and performance requirements. Duplicating data across layers (e.g. a summarised Gold table alongside a detailed Silver one) is intentional, not wasteful.

Why layer data at all?

The principle mirrors classic data warehouse thinking (Inmon, Kimball, Data Vault). Medallion gives those separation-of-concerns ideas a modern, lakehouse-native home. Teams need historical data for auditing and reloading, integrated views for analytics, and performance-optimized reads for reporting — often simultaneously. Layering makes each concern independently manageable.


The Three Layers

🟤 Bronze — Raw, “as-is” The landing zone. Data is ingested as-is, immutable, and stored in original formats (CSV, JSON, Parquet) before being converted to Delta. Enriched with metadata like ingestion timestamps and data source labels. Supports both batch and streaming, and is valuable for debugging and historic reloads.

⚪ Silver — Cleaned & filtered Data quality rules are applied here: fixing dates, removing special characters, correcting formatting errors. Historization is typically handled with SCD2. The structure remains source-oriented but is queryable and used by operational analytics teams. Columns are renamed for readability and lightweight conformations to enterprise standards are applied.

🟡 Gold — Refined business-level Data is harmonized and integrated across sources, typically modelled using the Kimball dimensional style (facts and dimensions). Complex business rules, calculations, and enrichments live here. The output consists of consumer-ready data products with backwards compatibility and data contracts for downstream users.


Practical Example: Unified Ad ROI Analysis

The business question: A growing e-commerce company wants to understand the impact of marketing spend across channels. The project: a unified ROI analysis for multi-channel ad campaigns.

Data sources: Facebook Ads and Google Ads.

Reporting requirements: Cost, impressions, clicks — broken down by ad, campaign, attribution (1-day window), and demographics (age, gender, device, platform).


Bronze: Load everything

Facebook Insights exposes 134 fields plus additional parameter combinations. Google Ads exposes 329 fields. The recommendation is simple: load everything the APIs return.

  • Downside: More storage used — but storage is cheap.
  • Upside: You avoid the classic “This report is great! Can you just add this one more field?” conversation.

Transformations at this stage are minimal: remove corrupt or unusable records, convert to Delta format, enrich with metadata (data source, ingestion timestamp), and partition by date only if data volume becomes a bottleneck.


Silver: Three options for modelling

This is where the real design decisions happen. Facebook and Google use different field names for the same concepts — cost_micros vs spend, ad_group vs adset, segments.age_range vs age — and some fields simply don’t exist on both platforms.

Option 1 — Two separate source tables Keep a Facebook silver table and a Google silver table, each cleaned and renamed to a common naming convention. Null fields where a platform has no equivalent. Simple to build, preserves source nuance, but requires joining or unioning later.

Option 2 — Single unified ads table Map both sources into one shared table at Silver. Simpler for downstream consumers and avoids repeated union logic. Recommended when platform-specific nuances aren’t needed in reporting. Important: document the mapping decisions clearly — adset and ad_group are not identical concepts.

Option 3 — Data Vault Model entities as Hubs (Hub_Ad, Hub_Campaign, Hub_Ad_Group), relationships as Links (Link_Ad_To_Campaign), and metrics as Satellites (Sat_Ad). More modular and better at tracking changes to relationships over time. Best suited for complex, long-lived enterprise data platforms.

  • Pro: Highly flexible, source-agnostic, great for auditability.
  • Con: More complex to build and query (more joins).

Gold: Two options for serving

Option A — Dimensional model (Kimball) Build a Fact_Ad_Performance table joined to Dim_Campaign, Dim_Ad_Group, Dim_Ad, and Dim_Date. Clean, well-understood by BI tools, and easy to slice. This is the most common approach.

Option B — One Big Table (OBT) Denormalise everything into a single wide table. Easier for analysts and self-service reporting — no joins needed. Works well when the use case is stable and the audience isn’t doing complex multi-fact analysis.

The linking logic (unioning Facebook and Google silver tables into a single Gold fact) can be done as a separate step — which is recommended — or handled directly in joins and unions within the Gold notebook.


Challenges

Data Governance — Storing everything in Bronze can become a GDPR problem. Access rights need clear policies, especially when silver tables contain PII from multiple source systems.

The Human Factor — Medallion is a naming framework, not a rulebook. Without documented team agreements on what belongs in each layer, the architecture degrades into inconsistency and shadow pipelines.


Key Takeaways

  • Load everything to Bronze — storage is cheap, but missing a field later is expensive.
  • Silver should clean and conform; Gold should model and integrate — resist merging these concerns.
  • Data modeling choices at Silver (flat tables vs. Data Vault) have long-lasting downstream consequences.
  • Document your layer contracts. Consistency across teams depends on written, shared agreements.

Medallion architecture looks simple on a diagram but the real design decisions — especially at the Silver layer — can have long-lasting consequences. If you want to think through the right approach for your situation, feel free to book a meeting.

ABOUT THE AUTHORS

Pieter Jansen

Pieter Jansen

Managing Partner Estonia

Discover more content from our blog​

Medallion Architecture

A layered data design pattern that progressively refines raw data into clean, business-ready insights — moving through Bronze, Silver, and Gold layers.

Read more »

Microsoft Fabric Ideation Session

The Microsoft Fabric Ideation Session helps organizations move from curiosity to clarity. In just three hours, our experts guide your team through Fabric’s AI and analytics capabilities to identify high-impact use cases, connect business goals with technology, and build a practical roadmap for value-driven transformation.

Read more »
Scroll to Top