Skip to main content
search
0
All Posts By

Building a scalable Data Platform?

Whether you're implementing Data Vault 2.1 or modernizing your analytics architecture, our experts help you turn complex data challenges into practical, future-proof solutions. From hands-on implementation to in-depth training, we support your team every step of the way.

Utilizing Potentials of Data Vault 2.0 – Overcoming Bad Practices – Part 2

Watch the Webinar

What are common mistakes when applying Data Vault 2.0 in enterprise data warehouse projects? Do you have questions regarding modeling in Data Vault and the realization of GDPR causes you great difficulties or is your project stuck because you are delivering no business value?

This webinar describes common Anti-patterns of Data Vault, their consequences, and the solution to eliminate them from your current or in your future projects.

Tune in and learn more to avoid bad practices and apply simple solutions.

Watch Webinar Recording

Webinar Agenda

1. How to use Data Vault for modeling business information
2. How to avoid the pitfalls of being unable to deliver business value
3. How to mask Business Keys from Hubs for privacy

Implementing GDPR in Data Warehousing

Solutions

Implementing GDPR

In the realm of data warehousing, whether it be Data Vault 2.0 or traditional approaches like Kimball and Inmon, data is stored and processed across multiple layers. The intricacies of privacy, particularly the application of security measures and the concept of the “right to be forgotten,” permeate every layer housing personal data.

For privacy implementation, the primary objective is the removal of Personally Identifiable Information (PII) data from each layer. This meticulous process aims to extract PII data, leaving non-PII data intact. In the ideal scenario, this ensures a reduction in consumer data proportionate to the removed PII data.

The General Data Protection Regulation (GDPR) casts a significant influence on data warehouse projects, introducing stringent requirements for data processing and storage. This impact spans across security considerations, determining who has access to what data, and privacy mandates, addressing the right to be forgotten.

ACCESS THE SOLUTION

Salesforce Meets Data Vault

Salesforce and Data Vault - decoupling

It’s a Match!

Data integration with Salesforce can be tricky and needs a system of business intelligence to handle the complexity. Data Vault is capable of decoupling all the necessary business-driven changes, extensions and customizations to the platform while maintaining the ability to become the cornerstone of an integrated architecture. The decoupling is a part of our Data Vault Boot Camp and is summarized in Figure 1. Scalefree can provide knowledge and implementation assistance in both Data Vault as well as Salesforce therefore creating the optimal partner for your Salesforce integration project.

Salesforce and Data Vault - decoupling

Figure 1. Data Vault Decoupling

Agile Integration of Salesforce

Salesforce is optimized for transactions and not for analytics, in fact this is one reason we want to integrate it. More likely than not your Salesforce system is not “just a CRM” anymore. Over the past decade, Salesforce has evolved into a general purpose business application platform and maintains many levels of functionalities if one chooses to utilize them.

Salesforce can deliver a Sandbox for your own developers and third parties to develop any application they want. In fact, our customers often go on to create a variety of application add-ons and customizations that are made within Salesforce. This means that your integration will become more complex over time as more elements are added to the “one” source system. For this reason, it fits very well with Data Vault.

We can extend the vault,  as technical integration and business needs are decoupled by the very idea of Data Vault. This is where all the standards that you used to build your Vault come into play and save your day. Now you can  leverage the benefits of having an extensible and agile data warehouse.

In our recent webinar Salesforce and Data Vault we discussed some of the change drivers around Salesforce, which are summarized in Figure 2. We also talked about how those change drivers can be defused by using Data Vault.

Salesforce and Data Vault

Figure 2. Salesforce Change Drivers

In addition to the Data Vault methodology, project Roles like the Domain experts can help with the communication between the source system operations folks and the data warehousing team. For more information on the roles, study Disciplined Agile Delivery (DAD) by Scott Ambler, which is now also a part of the Project management institute (PMI).

Of course, we have just touched the surface here as there are many topics we have not talked about yet. For example, some Salesforce related challenges can be either solved with an expensive workaround in the Data Warehouse or with some simple adjustments in Salesforce. 

Also to be touched upon at a later time, how to deal with Salesforce limits like API calls or operational reporting limitations.

What topics are you interested in? 

What challenges are you facing right now?

Conclusion

We have seen that the integration of Salesforce can be handled with Data Vault as both systems fit together quite well. Data Vault adds the agility your data warehouse requires within those changing and complex source systems that are needed to provide the highest possible business value to your organization while saving you re- engineering cost in the long run. In this way, you can create your own sustainable Salesforce data pipeline.

Advantages for Virtualization in the Data Vault

Solutions

Virtualization in the Data Vault

In legacy or traditional data warehousing, a common strategy involves materializing data marts, also known as information marts, to enhance performance. However, this approach comes with a notable disadvantage – an increase in storage requirements within traditional data warehousing systems.

Materializing data marts can offer performance benefits, but the trade-off is a higher demand for storage space. This approach has been traditionally employed to optimize query response times and facilitate efficient data access

ACCESS THE SOLUTION

Difference Between Data Vault, Inmon and Kimball Approach

Solutions

Data Vault, Inmon and Kimball

Data Vault 2.0 stands on a robust foundation of four pillars, each shaping its distinct architecture. The Methodology pillar guides the project lifecycle, ensuring standardization. Architecture defines the blueprint, prioritizing scalability. Modeling introduces agile techniques, enhancing adaptability. Implementation brings the design to life, addressing practical considerations.

The Inmon approach to building a data warehouse begins with the corporate data model. This model identifies the key subject areas, and most importantly, the key entities the business operates with. From this model, a detailed logical model is created for each major entity.

The Kimball approach to building the data warehouse starts with identifying the key business processes and the key business questions that the data warehouse needs to answer. The key sources (operational systems) of data for the data warehouse are analyzed and documented.

ACCESS THE SOLUTION

Batch Loading Strategies for Data Vault 2.0

Solutions

Loading Strategies

In the realm of general data warehousing, various loading strategies come into play. One prevalent challenge often encountered is the absence of deleted records within a delta. In typical data warehousing scenarios, it becomes crucial to recognize and track deletions from the source system, often referred to as soft deletes.

The distinction lies in the need to not only capture new or modified data (delta) but also to account for records that have been deleted at the source. Soft deletes involve marking records as deleted rather than physically removing them, allowing for a more nuanced and traceable approach to data management.

ACCESS THE SOLUTION

Data Lake Efficiency: Structural Solutions

Data Lake architecture

Data Lake Structure – Solution

The organization of data within a data lake can significantly impact downstream accessibility. While offloading data into the data lake is a straightforward process, the real challenge arises in efficiently retrieving this data. The efficiency of data retrieval becomes crucial for tasks such as the incremental or initial Enterprise Data Warehouse (EDW) load and for data science practitioners conducting independent queries. In practice, the ease of accessing data downstream depends on how well the data is organized within the data lake. A well-organized structure facilitates smoother retrieval processes, empowering both EDW loads and the independent querying needs of data scientists.

ACCESS THE SOLUTION
Within a hybrid data warehouse architecture, as promoted in the Data Vault 2.0 Boot Camp training, a data lake is used as a replacement for a relational staging area. Thus, to take full advantage of this architecture, the data lake is best organized in a way that allows efficient access within a persistent staging area pattern and better data virtualization.

Continue Reading

Requirements and Templates for Hashing

Solutions

REQUIREMENTS FOR HASHING

Traditional data warehouses often use sequence numbers to identify records in other tables.

By using sequences, this method comes with some drawbacks. One of the biggest drawbacks is performance. Since the sequence numbers are generated by a generator, this step presents a bottleneck. In addition sequence numbers are generated in the data warehouse instead of loading them before.

This solution provides a template and the requirements round about hashing.

ACCESS THE SOLUTION

Data Security Concepts in Data Vault 2.0

Solutions

Data Security Concepts

The focal point of our discussion revolves around critical aspects such as security controls, access controls, and the definition of identities. The primary objective of this solution is to safeguard data assets effectively. The approach taken is typically centered on securing data at both the row/document level and the attribute level.

In terms of security controls, the emphasis is on implementing measures that ensure the confidentiality, integrity, and availability of data. Access controls play a pivotal role in governing who can interact with specific data assets, limiting access to authorized individuals or roles. Defining identities involves establishing clear parameters for users and entities accessing the data, contributing to a robust security framework.

In summary, this Data Vault solution prioritizes a comprehensive approach to data security, addressing concerns at different levels to fortify the protection of valuable data assets.

ACCESS THE SOLUTION

Exemplary Naming Conventions in Data Vault 2.0

Solutions

Naming Conventions

Data Vault modeling is a powerful approach that introduces a multitude of entities to the database. To enhance usability and facilitate effective development, it is highly advisable to implement clear and consistent naming conventions. These conventions play a vital role in grouping entities by concept and conveying crucial information to developers, including the data source, rate of change, privacy levels, security considerations, and more.

Introducing a well-thought-out naming convention not only simplifies the development process but also contributes to a more organized and comprehensible database structure. It acts as a guide for developers, offering insights into the nature and characteristics of each entity.

Conversely, the absence of a naming convention poses challenges in identifying related tables within Data Vault models. This lack of structure can lead to confusion, making it harder for developers to discern the relationships and purpose of different entities.

In conclusion, the implementation of naming conventions is fundamental for the success of Data Vault modeling solutions. It promotes clarity, efficiency, and a systematic approach to database development.

ACCESS THE SOLUTION

Test Strategies for Data Vault 2.0 based EDW

Solutions

Test Strategies

Testing is very important for data warehouse systems to make them work correctly and efficiently. In unit testing, each component is separately tested.

By testing business logic using unit tests, there is an issue with available tools for unit testing in data warehouses.

This solution describes test strategies for enterprise data warehouse solutions based on Data Vault 2.0.

ACCESS THE SOLUTION

Detect Deletes – Standard Process without Last Seen Date, Using Descriptive Attribute

Solutions

Descriptive Attribute

Hubs and Links provide a distinct list of all business keys ever recognized by the data warehouse. They are not end-dated and don’t provide any information about the current status of the record (e.g. if the business key is still valid / assigned to a business object).

It is required to define how to store the deleted flag (for example as a descriptive field next to other descriptive fields in a standard satellite) and how to identify the deletes by identifying the sub-set of business keys from the target hub (or relationships from the link) that don’t exist anymore in the staging area source table.

Without this descriptive flag, which indicates the current status of the record, the data in the data warehouse should be considered as inconsistent because hubs and links don’t implement effectivity.

Detecting deletes is an important process to ensure the consistency of the data warehouse. This solution describes how to detect deletes without requiring a Last Seen Date. The state of the record is expressed using a descriptive field in the staging area.

ACCESS THE SOLUTION
Close Menu