Sparkle Reference Data Architecture

Sparkle’s reference architecture

Sparkle has developed his own pragmatic reference architecture based on years of experience, standards and industry best practices. Our reference architecture serves a starting point to develop a customer specific data architecture. We don’t believe that one standard solution will be the best fit for all customers, but our architecture mandates to think upfront on all the different possibilities. It allows to make a very conscious decision for each of the potential layers and to decide whether or not it’s relevant (yet). Our goal is to make success predictable and to build on the experience that we have gained over the years.

Why use Sparkle's reference architecture?

Sparkle’s reference architecture has several benefits and goals:

  • Facilitate discussions and designs by visualizing the data architecture
  • Aligning different stakeholders around a common and shared terminology
  • Plan ahead: From the start think about currently known and anticipated future challenges that the architecture needs to cope with
  • Prioritize: Allow to set priorities, and align the architecture with the organization’s priorities
  • Ensure alignment and embedding within the enterprise architecture
It is a layered architecture which is geared for flexibility, scalability, cost efficiency and performance. By having a clear purpose for each layer and by defining clear rules on which transformations happen where, we have a predictable architecture which is ready to take your organization to the next level of data maturity.

Sparkle's reference architecture in more detail

Data Sources

This layer consists of all data sources that contain potential interesting information for the organization. While traditionally this layer mainly consists of the organization OLTP databases (including ERP, CRM, …), there’s a lot innovation in this area. Nowadays, it can also include open data, data that’s retrieved through API’s or delivered through a service bus, real-time data streams, social media data, IoT data, unstructured text files and more.

Data Lake

A data lake is a storage repository that holds a vast amount of raw data in its native format. A data lake can contain structured, semi-structured and non-structured data. There is typically no schema on write, but schema on read. A data lake is often a cost effective way to store data.

Operational Layer

The operational layer contains detailed information on operational transactions and is typically loaded in a (near) real-time fashion. The goal is to support operational information needs. The data retention in the operational layer is typically limited in time. Historical changes are not maintained.

Foundation Layer

The foundation layer integrates and historizes the data coming from different data sources. The foundation layer is often modeled according to the Data Vault 2.0 methodology. The foundation layer stores a full history of the data on an atomic level. Business rules that are applicable for the entire organization are applied in this layer, to ensure consistency across the organization.

Presentation Layer

In the presentation layer the data is (virtually) transformed to a data model which is based on the information needs of the organization and is geared towards reporting. These are typically star schemas. When combined with a foundation layer, the presentation layer can always be rebuilt from scratch. This allows to adapt fast to new information needs. Business rules that are only valid for certain departments, are also typically defined here.

Access Layer

The access layer makes the data available for different types of consumption. It mainly consists of the tools that are used by people for analyzing and reporting on all the available information. Typical usage includes strategic scorecards, operational dashboards, pixel-perfect reporting, ad-hoc queries, multidimensional analyses and the different flavors of analytics. A new type of consumption that has risen thanks to the trend to build near real-time data hubs, is to make the data available through API’s for other applications in the organization.

Meta Data

Metadata tools are often used in support of data governance programs. A well governed business glossary will help to align on business term and definitions, and enable to link the business terms to the data assets that are documented in the data catalog. It also allows to monitor the data quality. Data lineage documents and visualizes where the data is coming from and which transformations were performed on it. Impact analysis allows to assess the impact on downstream systems and reporting environments when source systems change.

Data Lab

The data lab is a designated environment in which a heterogeneous set of data sources can be safely and cost-efficient explored in order to experiment, find new use cases or generate ad-hoc reports. Usually The data is only stored for a limited time. If the experiment proves to bring value and if it would bring value on a recurring basis, it’s typically incorporated in the industrialized data plaform.

Integration Component

Integration components embed and integrate the data architecture within the existing IT landscape. Common examples are a web portal (like (Azure) Sharepoint) to share information and reports within the organization, an authentication and authorization provider (like (Azure) Active Directory) or a Graphical Information System which allows to visualize information on a geographical map. Another example is that the last action of a data flow includes calling the API of another application or posting a topic to inform other applications of new information.

ABOUT THE AUTHOR

Kristof Van de Loock

Kristof Van de Loock

Managing Partner

Scroll to Top