Skip to main content
search
0

Managed Self-Service BI: Success in Spite of Stringent Regulation

Managed Self Service BI
The latest story to find itself added to the growing number of successful implementations of Scalefree’s services, and Data Vault 2.0 as a whole, centers around a sector known for its strict regulatory bodies in addition to high volume of data that demands the utmost in terms of privacy and security.

As the events that unfolded during the various financial crises at the start of the century left governments the world over seeking to impose stricter regulations for the financial sector. Banks within the sector were faced with a new task as they sought to continue operating with expansion in mind while still falling well within defined standards. Continue Reading

The Latest Innovations of Data Vault 2.0

Data Vault 2.0 Training FAQ - Customized Class

Focus on trends: Data Lake and no-sql, dwh architecture, self-service bi, modeling and gdpr

In the past, we wrote about topics we were confronted with when we consult our clients or just recognized widely occurring discussions in the web.

All these topics were already covered in Data Vault 2.0 and most of them moved into a higher focus within the last months. Coming with the trends in the private sector, NoSQL databases are now playing an important role for storing data fast from different source systems. This brings new opportunities to analyze the data, but also new challenges, i.e. how to query fast from those “semi”- and “unstructured” data, e.g. including Massive Parallel Processing (MPP). Furthermore, there is an abundance of tools to store, transport, transform and analyze the data, what often results in time and cost-intensive researching.  The knowledge about “Schema on Write” and “Schema on Read” (and their differences) became very important to build a Data “Warehouse”. A Schema has been and is still mandatory for Business Analysts when they have to tie the data to business objects for analytical reasons. Storing your data in NoSQL platforms only (let’s call it a “Data Lake”) is a good approach to capture all your company’s data, but it became much more difficult for Business User to get the data out from those platforms. A good and recommended approach is to have both, a Data Lake AND a Data Warehouse combined in a Hybrid Architecture.

Continue Reading

How to Scale in a Disciplined Agile Manner?

Looking beyond Scrum and learn how to increase the value in Data Vault 2.0 projects

Earlier this year we talked about Managed Self-Service BI to explain how business users can take a benefit from this approach in Data Vault 2.0. Now we want to show you how to get there from a project management perspective, even in large companies where the standard Scrum approach often not works with the accorded deployment/release regulations and other approaches like the Disciplined Agile framework are the better fit.

Agile transformation is hard because cultural change is hard. It’s not one problem that needs to be solved, but a series of hundreds of decisions affecting lots of people over a long period of time that affects relationships, processes, and even the state of mind of those working within the change.

There are two fundamental visions about what it means to scale agile: Tailoring agile strategies to address the scaling challenges – such as geographic distribution, regulatory compliance, and large team size – faced by development teams and adopting agility across your organization. Both visions are important, but if you can’t successfully perform the former then there is little hope that you’ll be successful at the latter.

Continue Reading

Still Struggling with GDPR?

Hubs & GDPR

GDPR

The new General Data Protection Regulation (GDPR) is a law by the European Union (EU) and became effective on May 25, 2018. This new regulation is designed to put a high level of protection to personal data of European citizens, what means that companies around the world have to establish transparency and ownership to the individuals’ data and need to get a clear declaration of consent from them to save and process their personal data. Though laws from countries outside the EU (especially the USA) tend to favor business over consumer, GDPR affects all companies over the world who have personal data from EU-citizens in their database.

What is new in GDPR?

To be careful with personal data is nothing new, especially not in the EU. The key change of collecting and processing personal data is that the data is now completely under control of the owner, who can force the companies to delete or anonymize their data or to request copies of all owners personal data stored in the system. Personal data or Privately Identifiable Information (PII) means data, an individual can be identified with, e.g. name, phone number or email address. Continue Reading

Data Warehouse and Data Lake: Do We Still Need a Data Warehouse?

Managed Self Service BI

“Big Data”, “Data Lake”, “Data Swamp”, “Hybrid Architecture”, “NoSQL”, “Hadoop” … terms you are confronted with very often these days when you are dealing with data. Furthermore, the question comes up if you really need a data warehouse nowadays when you deal with a high variety and volume of data. We want to talk about what a data lake is, if we need a data warehouse when using NoSQL platforms like Hadoop, and how it is combined with Data Vault.

WHAT IS A DATA LAKE?

There is a proper definition from Tamara Dull (SAS): “A data lake is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data. The data structure and requirements are not defined until the data is needed.” 1 Continue Reading

How to Combine Managed Self-Service BI with Data Vault 2.0?

Managed Self-Service BI and Data Vault 2.0

Combining Managed Self-Service BI with Data Vault 2.0

This article explores how combining Managed Self-Service BI with Data Vault 2.0 enables organizations to balance data governance and agility, ensuring both control and flexibility in their analytics processes.Last month we talked about a hybrid architecture in Data Vault 2.0, where we explain how to combine structured and unstructured data with a hybrid architecture. To follow up on this topic, we now want to explain how your business users (especially power users) can take a benefit from it with the managed Self-Service Business Intelligence (mSSBI) approach in Data Vault 2.0.

About Self-Service BI

Self-service BI allows end-users to completely circumvent IT due to this unresponsiveness of IT. In this approach, business users are left on their own with the whole process of sourcing the data from operational systems, integration and consolidation of the raw data. There are many problems with this self-service approach without the involvement of IT:

In many cases, end-users – even if they are power users with the knowledge to SQL, MDX, and other techniques, don’t have the right tools available to solve the tasks. Instead, much work is done manually and error-prone. But from our experience, it is not possible to completely prevent such power users from obtaining data from source systems, preparing it, and eventually reporting the data to upper management. What organizations need is a compromise between IT agility and data management that allows power users to obtain the data they need quickly, in a usable quality. To overcome these problems, the Data Vault 2.0 standard allows experienced or advanced business users to perform their own data analysis tasks on the raw data of the data warehouse.

About the Managed Self-Service BI Approach

In fact, a Data Vault 2.0 powered IT welcomes business users to take the data that is available in the enterprise data warehouse (either in the Raw Data Vault or in the Business Vault) to create local information marts using specialized tools. These tools retrieve the data from the enterprise data warehouse, apply a set of user-defined business rules and present the output to the end-user. IT might also create structures where organizational-wide business rules are applied to provide a consolidated view on parts of the model or pre-calculate KPIs to ensure consistency among such calculations. Because both types of data (raw data and business rule applied data) is already integrated, the business user can also join consolidated data with raw data from specific source systems. This approach is called Managed Self-Service BI, where IT evolves to a service organization that provides those power users with the data they want, in the timeframe they need. The data is integrated by its business key and can be consolidated as well as quality checked.

Implement MSSI in the Data Vault 2.0 Architecture

The Data Vault 2.0 architecture provides self-service capabilities for power users in the organization:

Managed Self-Service BI and Data Vault 2.0
Figure1 : mSSBI Architecture

In this case, power users who build their own, custom solutions can write back data and information into the Enterprise Data Warehouse by leveraging a dedicated user space for this purpose. The write-back can then later be re-used in the solution for information delivery. Furthermore, the business users can manage their own master data by using an MDM application. It enables authorized business users to change the parameter values and therefore influence the results of the business rules.

The difference between “managed” Self-Service BI to the standard Self-Service BI approach from the general industry is that Data Vault 2.0 provides a managed environment where data and information are provided in a controlled and secure manner. Power users can only query the data they are allowed to see from a data security perspective.
Another advantage is that this approach enables organizations in security and banking industries to provide a fully auditable and traceable environment that meets the highest security requirements.

Managed Self-Service BI does require a write-back possibility in the enterprise data warehouse architecture, otherwise, it’s just plain old BI solution. Without write-back, there are no differentiators. Beyond that, write-back is necessary in order for enriching or enhancing the quality of the data being put forward. Data scientists, for example, do this all the time: when using Hadoop they create a new target file as a result of their processing output. This is a direct write-back in the Hadoop space. We have required write-back in Self Service BI for years, otherwise, master data, and hierarchy management don’t work properly.

In this modification of the architecture from the previous section, the relational staging area is replaced by a HDFS based staging area which captures all unstructured and structured data. While capturing structured data on the HDFS appears as overhead at first glance, this strategy actually reduces the burden of the source system by making sure that the source data is always being extracted, regardless of any structural changes. The data is then extracted using Apache Drill, Hive External or similar technologies. It is also possible to store the Raw Data Vault and the Business Vault (the structured data in the Data Vault model) on Hive Internal.

Conclusion

Combining Managed Self-Service BI with Data Vault 2.0 empowers organizations to strike a balance between governance and agility in their data ecosystems. By leveraging Data Vault’s structured, auditable architecture alongside self-service BI’s flexibility, businesses can ensure data accuracy, security, and scalability while enabling users to gain faster insights. This approach enables collaboration between IT and business users, driving more informed decision-making and accelerating data-driven innovation.

Hybrid Architecture in Data Vault 2.0

Data Vault 2.0 Hybrid Architecture

Hybrid Architecture in Data Vault 2.0

Business users expect their data warehouse systems to load and prepare more and more data, regarding the variety, volume, and velocity of data. Also, the workload that is put on typical data warehouse environments is increasing more and more, especially if the initial version of the warehouse has become a success with its first users. Therefore, scalability has multiple dimensions. Last month we talked about Satellites, which play an important role in scalability. Now we explain how to combine structured and unstructured data with a hybrid architecture.

Logical Data Vault 2.0 Architecture

The Data Vault 2.0 architecture is based on three layers: the staging area which collects the raw data from the source systems, the enterprise data warehouse layer, modeled as a Data Vault 2.0 model, and the information delivery layer with information marts as star schemas and other structures. The architecture supports both batch loading of source systems and real-time loading from the enterprise service bus (ESB) or any other service-oriented architecture (SOA).

The following diagram shows the most basic logical Data Vault 2.0 architecture:

Data Vault 2.0 Architecture
Figure 1: Logical Data Vault 2.0 Architecture

In this case, structured data from source systems is first loaded into the staging area to reduce the operational / performance burden from the operational source systems. It is then loaded unmodified into the Raw Data Vault which represents the Enterprise Data Warehouse layer. After the data has been loaded into this Data Vault model (with hubs, links, and satellites), business rules are applied in the Business Vault on top of the data in the Raw Data Vault. Once the business logic is applied, both, the Raw Data Vault and the Business Vault are joined and restructured into the business model for information delivery in the information marts. The business user is using dashboard applications (or reporting applications) to access the information in the information marts.

The architecture allows implementation of the business rules in the Business Vault using a mix of various technologies, such as SQL-based virtualization (typically using SQL views), and external tools, such as business rule management systems (BRMS).

However, it is also possible to integrate unstructured NoSQL database systems using a hybrid architecture. Due to the platform independence of Data Vault 2.0, NoSQL can be used for every data warehouse layer, including the stage area, the enterprise data warehouse layer, and information delivery. Therefore, the NoSQL database could be used as a staging area and load data into the relational Data Vault layer. However, it could also be integrated both ways with the Data Vault layer via a hashed business key. In this case, it would become a hybrid architecture solution and information marts would consume data from both environments.

Hybrid Architecture

The standard Data Vault 2.0 architecture in Figure 1 focuses on structured data. Because more and more enterprise data is semi-structured or unstructured, the recommended best practice for a new enterprise data warehouse is to use a hybrid architecture based on a Hadoop cluster, as shown in the next figure:

Data Vault 2.0 Hybrid Architecture
Figure 2: Hybrid Data Vault 2.0 Architecture

In this hybrid architecture modification, the relational staging area is replaced by a HDFS based staging area that captures all unstructured and structured data. While capturing structured data on the HDFS appears as overhead at first glance, this strategy actually reduces the burden of the source system by making sure that the source data is always being extracted, regardless of any structural changes. The data is then extracted using Apache Drill, Hive External, or similar technologies.

It is also possible to store the Raw Data Vault and the Business Vault (the structured data in the Data Vault model) on Hive Internal.

Conclusion

Integrating a hybrid architecture within Data Vault 2.0 enables organizations to effectively manage both structured and unstructured data by leveraging platforms like Hadoop. This approach enhances scalability and flexibility, allowing for efficient data processing and storage. By replacing traditional relational staging areas with HDFS-based systems, businesses can reduce the burden on source systems and ensure seamless data extraction

Detect Deletes – Standard Process without Last Seen Date, Using Descriptive Attribute

Solutions

Descriptive Attribute

Hubs and Links provide a distinct list of all business keys ever recognized by the data warehouse. They are not end-dated and don’t provide any information about the current status of the record (e.g. if the business key is still valid / assigned to a business object).

It is required to define how to store the deleted flag (for example as a descriptive field next to other descriptive fields in a standard satellite) and how to identify the deletes by identifying the sub-set of business keys from the target hub (or relationships from the link) that don’t exist anymore in the staging area source table.

Without this descriptive flag, which indicates the current status of the record, the data in the data warehouse should be considered as inconsistent because hubs and links don’t implement effectivity.

Detecting deletes is an important process to ensure the consistency of the data warehouse. This solution describes how to detect deletes without requiring a Last Seen Date. The state of the record is expressed using a descriptive field in the staging area.

ACCESS THE SOLUTION

Visual Data Vault by Example: Links Modeling in the Banking Industry

Visual Data Vault example for links

Visual Data Vault

The following article describes how to represent links using a Visual Data Vault example. With the advent of Data Vault 2.0, which adds architecture and process definitions to the Data Vault 1.0 standard, Dan Linstedt standardized the Data Vault symbols used in modeling. Based on these standardized symbols, the Visual Data Vault (VDV) modeling language was developed, which can be used by EDW architects to build Data Vault models.

When our founders wrote the book, they required a visual approach to model the concepts of Data Vault in the book. For this purpose, they developed the graphical modeling language, which focuses on the logical aspects of Data Vault. The Microsoft Visio stencils and a detailed white paper are available on www.visualdatavault.com as a free download.

Links in Visual Data Vault

We previously published another newsletter how hubs are modeled in the accounting industry. In this Newsletter we explain the function of standard links and how the modeling in the banking industry works.

Links connect individual hubs in a Data Vault model and represent either transactions or relationships between business objects. Business objects are connected in business. No business object is entirely separate from other business objects. Instead, they are connected to each other through the operational business processes that use business objects in the execution of their tasks. The image below shows a link that connects two hubs (a standard link has to have at least two connections) as the following diagram shows:

Figure 1: A standard link connects two hubs

The link in in the image above references two hubs: Account and Customer. The connector (the arrow) should be read as “(the hub) Customer is used by (the link) Account to Customer.” The second reference is a little different because the name of the connection between the Account hub and the link is overwritten by the meaning of  a credit or a debt Account. This is necessary in cases where the model requires more meaning or when multiple connections are required to the same hub. The hash keys of each hub, which identify each business object unique by one calculated attribute, are replicated into the link entity by using the same attribute name.

A link represents many-to-many relationships and therefore they provide flexibility because changes to the business rules don’t require re-engineering and the granularity is expressed by the number of referenced hubs and is thus well documented.

Link table example
Figure 2: Link table with its attributes

The link contains all hash keys of the related hubs (logical foreign keys), a load date when the relationship arrives the data warehouse for the first time, the record source where the data comes from, and the link hash key (logical primary key) which is calculated from the business keys of the hubs (not from the hash keys – never hash a hash!) and follows an insert-only loading pattern.

Links greatly improve the flexibility of the Data Vault model, because it is easy to add links or modify the relationship type of existing links. It takes less time to respond to changes in the business. To add new functionality, you only need to add new hubs and connect them via links to existing hubs. Usually a standard satellite is attached to the link, which contains the descriptive data of the relationship between the hubs.

Another common kind of link is the Non-Historized Link (also known as Transactional Link) which contains transactions only and does not need a Satellite, what means that the loading pattern is a complete insert-only approach. Read more about the value of NH-Links in an earlier Newsletter this year.

Conclusion

In conclusion, the Visual Data Vault modeling language offers a standardized and effective approach for representing complex relationships and transactions within the banking industry. By connecting business objects through links, it ensures a clear and logical structure that mirrors real-world processes. This methodology not only enhances the clarity of data models but also facilitates efficient data management and retrieval, making it an invaluable tool for enterprise data warehouse architects.

Close Menu