Skip to main content
Category

Scalefree Newsletter

About Non-Functional Requirements

By Scalefree Newsletter 2 Comments
In our trainings and consulting practice, we often pitch the idea of “focusing on the business value” to the audience. Business value in enterprise data warehousing is defined as “something of value to the business” (believe it or not, we believe it should be said sometimes).

Typically, the reason by business setups a budget for the enterprise data warehouse is that they want some reports or dashboards with actionable information.

On the other hand, the most important counter-argument is that the enterprise data warehouse is more than just reports and dashboards. There is actually a lot of more technical components (non-functional requirements) to be built, including but not limited to: Read More

Organization of Information Requirements

By Scalefree Newsletter 5 Comments

Just a Recommendation…how we Organize our Information Requirements

 

Information is required by business users throughout the industry. However, as part of our consulting engagements, we also encounter a lack of proper description as to what the business user actually needs. 

So, we want to use this article to present the way we structure our information requirements internally at Scalefree as well as the way we do so for many of our customers.

What about User Stories?

We all know user stories from Scrum and many business intelligence projects.
Their structure is typically something that looks like:

As a <type of user>, I want <some goal> so that <some reason>.

The following example represents a typical user story we would receive in a project:

As a <marketing user>, I want <to have an overview report with the number of leads from a marketing channel> so that <I can adjust the marketing budget accordingly>.

Now, what should we do with this user story?
Many details are missing, and yes, we all know about product backlog refinement. The problem is that the user story is just not sufficient enough within business intelligence efforts and some structure might be of help.

Information Requirements

Developers in enterprise data warehousing and business intelligence need much more detail than just the user story. On the other hand, the user story is a good starting point for the information requirement. So, it can be treated as a typical introduction. The overall structure looks like this:
Read More

About Information Marts in Data Vault 2.0

By Scalefree Newsletter No Comments
In the Data Vault 2.0 architecture, information marts are used to deliver information to the end-users.

Conceptually, an information mart follows the same definition as a data mart in legacy data warehousing. However, in legacy data warehousing, a data mart is used to deliver useful information, not raw data. This is why the data mart has been renamed in Data Vault 2.0 to better reflect the use case.

But the definition of information marts has more facets. In the book “Building a Scalable Data Warehouse with Data Vault 2.0” we present three types of marts: Read More

Granularities of Business Vault Entities

By Scalefree Newsletter No Comments
The Business Vault is the layer in the Data Vault 2.0 architecture where business logic is implemented to transform, cleanse and modify the data.

The book “Building a Scalable Data Warehouse with Data Vault 2.0” by Scalefree’s founders Dan Linstedt and Michael Olschimke and the Data Vault 2.0 Boot Camp shows how to implement such business logic using various Business Vault entities, such as computed satellites.

However, it is worth to note that this is only half the story, half the knowledge. The book shows computed satellites (and other entities) with a load date in the primary key of the computed satellite. Such satellites are great for capturing the results from business logic that is applied on the incoming deltas. However, there are two different types of granularities for business logic in the Business Vault: Read More

Data Warehousing and why we need it

By Scalefree Newsletter No Comments

A data warehouse is a subject oriented, nonvolatile, integrated, time variant collection of data to support management’s decisions

  • Inmon, W. H. (2005). Building the Data Warehouse. Indianapolis, Ind.: Wiley.
It provides the technical infrastructure needed to run Business Intelligence effectively. Its purpose is to integrate data from different data sources and to provide a historicised database. Through a DWH, consistent and reliable reporting can be ensured. A standardised view of the data can prevent interpretation errors, improved data quality and leads to better decision-making. Furthermore, the historization of data offers additional analysis possibilities and leads to (complete) auditability.  Read More

Handling Validation of Relationships in Data Vault 2.0

By Scalefree Newsletter 4 Comments
In Data Vault 2.0, we differentiate data by keys, relationships and description.
That said, an often underestimated point is the handling of relationships in Data Vault 2.0.
In the following we explain what to consider and how to deal with it:

There are different ways to handle the validation of relationships from source systems depending on how the data is delivered, (full-extract or CDC), and the way a delete is delivered by the source system, such as a soft delete or hard delete.

First, let us explain the different kinds of deletes in source systems:

  1. Hard delete – A record is hard deleted in the source system and no longer appears in the system.
  2. Soft delete – The deleted record still exists in the source systems database and is flagged as deleted.


Secondly, let’s explore  how we find the data in the staging area:

  1. Full-extract – This can be the current status of the source system or a delta/incremental extract.
  2. CDC (Change Data Capture) – Only new, updated or deleted records to load data in an incremental/delta way.

 

To keep the following explanation as simple as possible, our assumption is that we want to mark relationships as deleted as soon as we get the delete information, even if there is no audit trail from the source system (data aging is another topic).

Read More

Salesforce meets Data Vault

By Scalefree Newsletter No Comments

It’s a Match!

Data integration with Salesforce can be tricky and is in dire need of a system of business intelligence to handle that complexity.

Data Vault is capable of decoupling all the necessary business-driven changes, extensions and customizations to the platform while maintaining the ability to become the cornerstone of an integrated architecture. The decoupling is a part of our Data Vault Boot Camp and is summarized in Figure 1.

Scalefree can provide knowledge and implementation assistance in both Data Vault as well as Salesforce therefore creating the optimal partner for your Salesforce integration project.

Read More

Data Quality in the Data Vault Architecture

By Scalefree Newsletter No Comments

The Basis for solid decision-making

In making business decisions whether daily or long term, the quality of data is a critical facet to factor into these decision-making processes.

Thus, the immediate access to the data and certainty on its quality can enhance business performance immensely. But the sad truth is that we see bad data in operational systems due to human-caused errors such as typos, ignoring standards and duplicates, in addition to lack of input-validators in operating systems such as must-fields not being declared as well as references to other entities (primary-foreign-key constraints) not being defined.

Read More

Write-backs in the Enterprise Data Warehouse Architecture

By Scalefree Newsletter No Comments

Many people already know the three layer architecture of data warehouses which is used in Data Vault 2.0. The first layer represents the staging area which holds the raw data from the source systems. The enterprise data warehouse layer, which in this case contains a Data Vault 2.0 model and the third layer with the Information Marts, which deliver the information in various structures (Star Schemas, Snowflake Schemas etc.).

Figure 1. Data Vault 2.0 Architecture

This architecture provides possibilities and benefits for writing back data. Two possibilities are writing back data into the enterprise data warehouse and into the source systems. This issue covers the write back into the enterprise data warehouse, while an upcoming article will cover the write back into the source systems.

Read More

Data Vault Use Cases Beyond Classical Reporting: Part 3

By Scalefree Newsletter No Comments
As we have shown in previous issues, Data Vault 2.0 enables individuals to implement reporting beyond the traditional methods.
In the first part, we demonstrated how to perform data cleansing in Data Vault 2.0.
And the second use case showed how to implement business process automation using Interface Marts.

The scalability and flexibility of Data Vault 2.0 offers a whole variety of use cases that can be realized, e.g. to optimize as well as automate operational processes, predict the future, push data back to operational systems as a new input or trigger events outside the data warehouse, to name a few. Read More

Satellite modeling for any structural changes in the source system

By Scalefree Newsletter No Comments

Modeling a Satellite in the instance of any structural changes within the source system

Over time, most source systems change.
The question is how to absorb these changes into the data warehouse based on Data Vault, especially when considering the satellites?
It is necessary to find a balance between the reengineering effort and performance when the source table structure changes. To better help those who find structural changes in source systems, this article will present our recommendations, based upon our knowledge base,  for various types of changes in a source.

This article describes features embodied in the Data Vault 2.0 model: the foundation of hub, link, and satellite entities can adjust to changes in the source data easily, thus reducing the cost of reengineering the enterprise data warehouse. 

Read More

Splitting a Satellite entity based on the source data

By Scalefree Newsletter No Comments
Satellite splitting criteria plays a vital role in a satellite’s structure. Being such, it is not recommended that the entirety of descriptive data related to a business key should be stored in a single satellite structure. Instead, raw data should preferably be split by certain criteria.

In general, we have defined the following types of satellite splits:

  1. Splitting by source system
  2. Splitting by rate of change

Additionally, we have defined two more types of splits as mentioned below:

  1. Splitting by level of security and by the level of privacy
  2. Business-driven split

A satellite split by source system is strongly recommended to prevent two issues when loading the data into the enterprise data warehouse: First, if two different source systems with different relational structures should be loaded into the same satellite entity, a transformation of the structure might be required. However, structural transformation requires business logic sooner or later and that should be deferred to the information delivery stage to support fully-auditable environments as well as the application of multiple business perspectives. Read More

What to consider for naming conventions in Data Warehousing – Part 2

By Scalefree Newsletter No Comments
In a previous blog post, we discussed the different aspects of a naming standard documentation – from letter case types to the consideration between using prefixes or suffixes in database object names.

Throughout this article, we will continue presenting our suggestions for naming conventions in a data warehouse solution, as well as sharing examples for naming standards, which both our team and our customers utilize internally.

Layer schemas

For layer schema names, we prefer using prefixes.
As discussed in the previous blog post, this convention boosts visibility in data exploration within the Enterprise Data Warehouse for developers and business users by grouping schemas of the same data warehouse layer together.
The following is a list of common Enterprise Data Warehouse layers and our associated recommendations regarding naming conventions: Read More

Delete and Change Handling Approaches in Data Vault 2.0 without a Trail

By Scalefree Newsletter 2 Comments
In January of this year, we published a piece detailing an approach to handle deletes and business key changes of relationships in Data Vault without having an audit trail in place.
This approach is an alternative to the Driving Key structure, which is part of the Data Vault standards and a valid solution.
Though, at times it may be difficult to find the business keys in a relationship which will never change and therefore be used as the anchor keys, Link Driving Key, when querying. The presented method inserts counter records for changed or deleted records, specifically for transactional data, and is a straightforward as well as pragmatic approach. However, the article caused a lot of questions, confusion and disagreements.
That being said, it is the intention of this blogpost to dive deeper into the technical implementation in which we could approve by employing it. Read More

An Efficient Data Lake Structure

By Scalefree Newsletter One Comment
Within a hybrid data warehouse architecture, as promoted in the Data Vault 2.0 Boot Camp training, a data lake is used as a replacement for a relational staging area. Thus, to take full advantage of this architecture, the data lake is best organized in a way that allows efficient access within a persistent staging area pattern and better data virtualization.

Read More

Data Vault Use Cases Beyond Classical Reporting: Part 2

By Scalefree Newsletter No Comments
As we first introduced within the first part of the Data Vault Use Cases article series, the Enterprise Data Warehouse (EDW) can do more than just simple reporting and dashboarding. 

We previously explored how the EDW can help to improve data quality by implementing data cleansing rules. 

This can be applied by write-back operations that affect the source systems directly. Though this was only one example of how to add more value to the EDW.
The scalability and flexibility of Data Vault 2.0 offers a whole variety of use cases that can be realized, e.g. to optimize and automate operational processes, predict the future, push data back to operational systems as a new input or trigger events outside the data warehouse, to name a few.

Read More

Capturing Semi-Structured Descriptive Data

By Scalefree Newsletter No Comments
The previous articles within this series have presented hub and link entities to capture business keys as well as the relationships between business keys. To illustrate, the hub document collection in MongoDB is a distinct list of business keys used to identify customers. 

As to capture the descriptive data, which in this case is the describing factor of the business keys, satellite entities are used in Data Vault. As both business keys and relationships between business keys can be described by user data, satellites may be attached to hub as well as link entities as such:

Read More

Identifying Additional Relationships between Documents

By Scalefree Newsletter No Comments
The last article within our series recently covered the Data Vault hub entity which is used to capture distinct list of business keys in an enterprise data warehouse as most integration will actually occur on these hub entities themselves. However, there are scenarios in which the integration of data solely on these hub entities is not sufficient enough for the necessary end goal in mind. 

Consider this situation in which a sample data set, involving an insurance company, concerning customers signing car and home insurance policies as well as filing claims, each respectively. Though before moving forward with the example, it is important to note that there are relationships between the involved business keys, that of the customer number, the policy identifiers, and the claims.

These relationships are captured by Data Vault link entities and just like hubs, they contain a distinct list of records, as such, they contain no duplicates in terms of stored data. Thus, both will form the skeleton of Data Vault and later be described by descriptive user data stored in satellites.

Read More