Building a scalable Data Platform?

Building a scalable Data Platform? In Intermediate

Utilizing Potentials of Data Vault 2.0 – Overcoming Bad Practices – Part 2

Watch the Webinar

What are common mistakes when applying Data Vault 2.0 in enterprise data warehouse projects? Do you have questions regarding modeling in Data Vault and the realization of GDPR causes you great difficulties or is your project stuck because you are delivering no business value?

This webinar describes common Anti-patterns of Data Vault, their consequences, and the solution to eliminate them from your current or in your future projects.

Tune in and learn more to avoid bad practices and apply simple solutions.

Watch Webinar Recording

Webinar Agenda

1. How to use Data Vault for modeling business information
2. How to avoid the pitfalls of being unable to deliver business value
3. How to mask Business Keys from Hubs for privacy

Building a scalable Data Platform? In Solutions

Implementing GDPR in Data Warehousing

Implementing GDPR

In the realm of data warehousing, whether it be Data Vault 2.0 or traditional approaches like Kimball and Inmon, data is stored and processed across multiple layers. The intricacies of privacy, particularly the application of security measures and the concept of the “right to be forgotten,” permeate every layer housing personal data.

For privacy implementation, the primary objective is the removal of Personally Identifiable Information (PII) data from each layer. This meticulous process aims to extract PII data, leaving non-PII data intact. In the ideal scenario, this ensures a reduction in consumer data proportionate to the removed PII data.

The General Data Protection Regulation (GDPR) casts a significant influence on data warehouse projects, introducing stringent requirements for data processing and storage. This impact spans across security considerations, determining who has access to what data, and privacy mandates, addressing the right to be forgotten.

ACCESS THE SOLUTION

Building a scalable Data Platform? In Data Tools, Salesforce

Salesforce Meets Data Vault

It’s a Match!

Data integration with Salesforce can be tricky and needs a system of business intelligence to handle the complexity. Data Vault is capable of decoupling all the necessary business-driven changes, extensions and customizations to the platform while maintaining the ability to become the cornerstone of an integrated architecture. The decoupling is a part of our Data Vault Boot Camp and is summarized in Figure 1. Scalefree can provide knowledge and implementation assistance in both Data Vault as well as Salesforce therefore creating the optimal partner for your Salesforce integration project.

In this article:

Agile Integration of Salesforce
Conclusion

Figure 1. Data Vault Decoupling

Agile Integration of Salesforce

Salesforce is optimized for transactions and not for analytics, in fact this is one reason we want to integrate it. More likely than not your Salesforce system is not “just a CRM” anymore. Over the past decade, Salesforce has evolved into a general purpose business application platform and maintains many levels of functionalities if one chooses to utilize them.

Salesforce can deliver a Sandbox for your own developers and third parties to develop any application they want. In fact, our customers often go on to create a variety of application add-ons and customizations that are made within Salesforce. This means that your integration will become more complex over time as more elements are added to the “one” source system. For this reason, it fits very well with Data Vault.

We can extend the vault, as technical integration and business needs are decoupled by the very idea of Data Vault. This is where all the standards that you used to build your Vault come into play and save your day. Now you can leverage the benefits of having an extensible and agile data warehouse.

In our recent webinar Salesforce and Data Vault we discussed some of the change drivers around Salesforce, which are summarized in Figure 2. We also talked about how those change drivers can be defused by using Data Vault.

Figure 2. Salesforce Change Drivers

In addition to the Data Vault methodology, project Roles like the Domain experts can help with the communication between the source system operations folks and the data warehousing team. For more information on the roles, study Disciplined Agile Delivery (DAD) by Scott Ambler, which is now also a part of the Project management institute (PMI).

Of course, we have just touched the surface here as there are many topics we have not talked about yet. For example, some Salesforce related challenges can be either solved with an expensive workaround in the Data Warehouse or with some simple adjustments in Salesforce.

Also to be touched upon at a later time, how to deal with Salesforce limits like API calls or operational reporting limitations.

What topics are you interested in?

What challenges are you facing right now?

Conclusion

We have seen that the integration of Salesforce can be handled with Data Vault as both systems fit together quite well. Data Vault adds the agility your data warehouse requires within those changing and complex source systems that are needed to provide the highest possible business value to your organization while saving you re- engineering cost in the long run. In this way, you can create your own sustainable Salesforce data pipeline.

Building a scalable Data Platform? In Solutions

Advantages for Virtualization in the Data Vault

Virtualization in the Data Vault

In legacy or traditional data warehousing, a common strategy involves materializing data marts, also known as information marts, to enhance performance. However, this approach comes with a notable disadvantage – an increase in storage requirements within traditional data warehousing systems.

Materializing data marts can offer performance benefits, but the trade-off is a higher demand for storage space. This approach has been traditionally employed to optimize query response times and facilitate efficient data access

ACCESS THE SOLUTION

Building a scalable Data Platform? In Solutions

Difference Between Data Vault, Inmon and Kimball Approach

Data Vault, Inmon and Kimball

Data Vault 2.0 stands on a robust foundation of four pillars, each shaping its distinct architecture. The Methodology pillar guides the project lifecycle, ensuring standardization. Architecture defines the blueprint, prioritizing scalability. Modeling introduces agile techniques, enhancing adaptability. Implementation brings the design to life, addressing practical considerations.

The Inmon approach to building a data warehouse begins with the corporate data model. This model identifies the key subject areas, and most importantly, the key entities the business operates with. From this model, a detailed logical model is created for each major entity.

The Kimball approach to building the data warehouse starts with identifying the key business processes and the key business questions that the data warehouse needs to answer. The key sources (operational systems) of data for the data warehouse are analyzed and documented.

ACCESS THE SOLUTION

Building a scalable Data Platform? In Solutions

Batch Loading Strategies for Data Vault 2.0

Loading Strategies

In the realm of general data warehousing, various loading strategies come into play. One prevalent challenge often encountered is the absence of deleted records within a delta. In typical data warehousing scenarios, it becomes crucial to recognize and track deletions from the source system, often referred to as soft deletes.

The distinction lies in the need to not only capture new or modified data (delta) but also to account for records that have been deleted at the source. Soft deletes involve marking records as deleted rather than physically removing them, allowing for a more nuanced and traceable approach to data management.

ACCESS THE SOLUTION

Building a scalable Data Platform? In Solutions

Requirements and Templates for Hashing

REQUIREMENTS FOR HASHING

Traditional data warehouses often use sequence numbers to identify records in other tables.

By using sequences, this method comes with some drawbacks. One of the biggest drawbacks is performance. Since the sequence numbers are generated by a generator, this step presents a bottleneck. In addition sequence numbers are generated in the data warehouse instead of loading them before.

This solution provides a template and the requirements round about hashing.

ACCESS THE SOLUTION

Building a scalable Data Platform? In Solutions

Data Security Concepts in Data Vault 2.0

Data Security Concepts

The focal point of our discussion revolves around critical aspects such as security controls, access controls, and the definition of identities. The primary objective of this solution is to safeguard data assets effectively. The approach taken is typically centered on securing data at both the row/document level and the attribute level.

In terms of security controls, the emphasis is on implementing measures that ensure the confidentiality, integrity, and availability of data. Access controls play a pivotal role in governing who can interact with specific data assets, limiting access to authorized individuals or roles. Defining identities involves establishing clear parameters for users and entities accessing the data, contributing to a robust security framework.

In summary, this Data Vault solution prioritizes a comprehensive approach to data security, addressing concerns at different levels to fortify the protection of valuable data assets.

ACCESS THE SOLUTION

Building a scalable Data Platform? In Solutions

Exemplary Naming Conventions in Data Vault 2.0

Naming Conventions

Data Vault modeling is a powerful approach that introduces a multitude of entities to the database. To enhance usability and facilitate effective development, it is highly advisable to implement clear and consistent naming conventions. These conventions play a vital role in grouping entities by concept and conveying crucial information to developers, including the data source, rate of change, privacy levels, security considerations, and more.

Introducing a well-thought-out naming convention not only simplifies the development process but also contributes to a more organized and comprehensible database structure. It acts as a guide for developers, offering insights into the nature and characteristics of each entity.

Conversely, the absence of a naming convention poses challenges in identifying related tables within Data Vault models. This lack of structure can lead to confusion, making it harder for developers to discern the relationships and purpose of different entities.

In conclusion, the implementation of naming conventions is fundamental for the success of Data Vault modeling solutions. It promotes clarity, efficiency, and a systematic approach to database development.

ACCESS THE SOLUTION

Building a scalable Data Platform? In Solutions

Test Strategies for Data Vault 2.0 based EDW

Test Strategies

Testing is very important for data warehouse systems to make them work correctly and efficiently. In unit testing, each component is separately tested.

By testing business logic using unit tests, there is an issue with available tools for unit testing in data warehouses.

This solution describes test strategies for enterprise data warehouse solutions based on Data Vault 2.0.

ACCESS THE SOLUTION

Building a scalable Data Platform? In Solutions

Detect Deletes – Standard Process without Last Seen Date, Using Descriptive Attribute

Descriptive Attribute

Hubs and Links provide a distinct list of all business keys ever recognized by the data warehouse. They are not end-dated and don’t provide any information about the current status of the record (e.g. if the business key is still valid / assigned to a business object).

It is required to define how to store the deleted flag (for example as a descriptive field next to other descriptive fields in a standard satellite) and how to identify the deletes by identifying the sub-set of business keys from the target hub (or relationships from the link) that don’t exist anymore in the staging area source table.

Without this descriptive flag, which indicates the current status of the record, the data in the data warehouse should be considered as inconsistent because hubs and links don’t implement effectivity.

Detecting deletes is an important process to ensure the consistency of the data warehouse. This solution describes how to detect deletes without requiring a Last Seen Date. The state of the record is expressed using a descriptive field in the staging area.

ACCESS THE SOLUTION

Building a scalable Data Platform?

Utilizing Potentials of Data Vault 2.0 – Overcoming Bad Practices – Part 2

Watch the Webinar

Webinar Agenda

Implementing GDPR in Data Warehousing

Implementing GDPR

Salesforce Meets Data Vault

It’s a Match!

Agile Integration of Salesforce

Conclusion

Advantages for Virtualization in the Data Vault

Virtualization in the Data Vault

Difference Between Data Vault, Inmon and Kimball Approach

Data Vault, Inmon and Kimball

Batch Loading Strategies for Data Vault 2.0

Loading Strategies

Requirements and Templates for Hashing

REQUIREMENTS FOR HASHING

Data Security Concepts in Data Vault 2.0

Data Security Concepts

Exemplary Naming Conventions in Data Vault 2.0

Naming Conventions

Test Strategies for Data Vault 2.0 based EDW

Test Strategies

Detect Deletes – Standard Process without Last Seen Date, Using Descriptive Attribute

Descriptive Attribute

Build Better Data Platforms

SOLUTIONS

TRAINING

EVENTS

KNOWLEDGE HUB

CAREERS

COMPANY

Make Better Salesforce Decisions

Build Better Data Platforms