Sparkle has developed his own pragmatic reference architecture based on years of experience, standards and industry best practices. Our reference architecture serves a starting point to develop a customer specific data architecture. We don’t believe that one standard solution will be the best fit for all customers, but our architecture mandates to think upfront on all the different possibilities. It allows to make a very conscious decision for each of the potential layers and to decide whether or not it’s relevant (yet). Our goal is to make success predictable and to build on the experience that we have gained over the years.
Why use Sparkle's reference architecture?
Sparkle’s reference architecture has several benefits and goals:
- Facilitate discussions and designs by visualizing the data architecture
- Aligning different stakeholders around a common and shared terminology
- Plan ahead: From the start think about currently known and anticipated future challenges that the architecture needs to cope with
- Prioritize: Allow to set priorities, and align the architecture with the organization’s priorities
- Ensure alignment and embedding within the enterprise architecture
Sparkle's reference architecture in more detail
A data lake is a storage repository that holds a vast amount of raw data in its native format. A data lake can contain structured, semi-structured and non-structured data. There is typically no schema on write, but schema on read. A data lake is often a cost effective way to store data.
In the presentation layer the data is (virtually) transformed to a data model which is based on the information needs of the organization and is geared towards reporting. These are typically star schemas. When combined with a foundation layer, the presentation layer can always be rebuilt from scratch. This allows to adapt fast to new information needs. Business rules that are only valid for certain departments, are also typically defined here.
The access layer makes the data available for different types of consumption. It mainly consists of the tools that are used by people for analyzing and reporting on all the available information. Typical usage includes strategic scorecards, operational dashboards, pixel-perfect reporting, ad-hoc queries, multidimensional analyses and the different flavors of analytics. A new type of consumption that has risen thanks to the trend to build near real-time data hubs, is to make the data available through API’s for other applications in the organization.
The data lab is a designated environment in which a heterogeneous set of data sources can be safely and cost-efficient explored in order to experiment, find new use cases or generate ad-hoc reports. Usually The data is only stored for a limited time. If the experiment proves to bring value and if it would bring value on a recurring basis, it’s typically incorporated in the industrialized data plaform.
Integration components embed and integrate the data architecture within the existing IT landscape. Common examples are a web portal (like (Azure) Sharepoint) to share information and reports within the organization, an authentication and authorization provider (like (Azure) Active Directory) or a Graphical Information System which allows to visualize information on a geographical map. Another example is that the last action of a data flow includes calling the API of another application or posting a topic to inform other applications of new information.