Skip to main content
search
0

Alternative to the Driving Key Implementation in Data Vault 2.0

Driving Key Source Table

Alternative to the Driving Key

There is a special case when a part of the hub references stored in a link can change without describing a different relation. This has a great impact on the link satellites. Furthermore, back in 2017 we introduced the link structure with an example of a Data Vault model in the banking industry. We showed how the model looks like when a link represents either a relationship or a transaction between two business objects. A link can also connect more than two hubs. What is the alternative to the Driving Key implementation in Data Vault 2.0?

The Driving Key

A relation or transaction is often identified by a combination of business keys in one source system. In Data Vault 2.0 this is modelled as a normal link connecting multiple hubs each containing a business key. A link contains also its own hash key, which is calculated over the combination of all parents’ business keys. So when the link connects four hubs and one business key changes, the new record will show a new link hash key. There is a problem when four business keys describe the relation, but only three of them identify it as unique. We can not identify the business object by using only the hash key of the link. The problem is not a modeling error, but we have to identify the correct record in the related satellite when query the data. In Data Vault 2.0 this is called a driving key. It is a consistent key in the relationship and often the primary key in the source system.

The following tables demonstrate the relationship between an employee and a department from a source system.

Driving Key Source Table
Table 1: employee-department relationship

The following Data Vault model can be derived from this source structure.

Driving Key in a Data Vault Model
Figure 1: Data Vault model

The link table “Empl_Dep” is derived from the table “Employee” in the source system. The Driving Key in this example is the Employee_Number as it is the primary key in the source table, and an employee can work in only one department at the same time. This means, the true Driving Key “lives” in the satellite of the employee. If the department of an employee switches, there is no additional record in the employee’s satellite table, but a new one in the link table, which is legitimate.

Driving Key in a Data Vault Model
Table 2: link data

To query the most recent delta you have to query it from the link table, grouped by the driving key.

To sum up you will always have a new link hash key when a business key changes in a relation. The challenge is to identify the driving key, which is a unique business key (or a combination of business keys) for the relationship between the connected hubs. Sometimes you would have to add an additional attribute to get a unique identifier.

Both present an issue for power users with access to the Data Vault model. Without naming conventions there is a risk that a group by statement is performed on more attributes than just the driving key which would lead to unexpected and incorrect aggregate values – even though the data itself is correctly modeled.

When dealing with relationship data there is a better solution available than the driving key: we typically prefer to model such data as a non-historized link and insert technical counter-transactions to the data when a hub reference changes.

In the case of a modified record in the source, we insert two records to the non-historized links: one for the new version of the modified record in the source and one for the old version that still exists in the target (non-historized link) but needs to be countered now – the technical counter record. To distinguish the records from the source and the counter transactions a new column is inserted, often called “Counter”.

The standard value for this counter attribute is 1 for records from the source and -1 for the technical counter transactions. Important: We do not perform any update statements, we still insert only the new counter records. When querying the measures from the link you just multiply the measures with the counter value.

Driving Key in a Data Vault Model
Table 3: LinkConnection with counter attribute

The table 3 shows a link with a counter attribute. When a record changes in the source system it is inserted with the original value and a counter value of -1 in the link table of the data warehouse. For the changed value there is a new link hash key which is also calculated over the descriptive attribute ‘Salary’. The counter value of the new record is 1.

Conclusion

Because identifying the driving key of a relation can be a problem in some situations you can use an alternative solution to avoid the driving key. All changes and deletes are tracked using a counter attribute in the non-historized link table. It stores also the descriptive attributes and the link hash key is calculated over all attributes.

Test Strategies for Data Vault 2.0 based EDW

Solutions

Test Strategies

Testing is very important for data warehouse systems to make them work correctly and efficiently. In unit testing, each component is separately tested.

By testing business logic using unit tests, there is an issue with available tools for unit testing in data warehouses.

This solution describes test strategies for enterprise data warehouse solutions based on Data Vault 2.0.

ACCESS THE SOLUTION

How to Implement Insert Only in Data Vault 2.0?

End dating satellites in Data Vault 2.0 for insert only architecture

Insert Only in Data Vault 2.0

Skilled modeling is important to harness the full potential of Data Vault 2.0. To get the most out of the system due to scalability and performance, it also has to be built on an architecture which is completely insert only. On the way into the Data Vault, all update operations can be eliminated and loading processes simplified.

The common implementation in Data Vault 2.0

In the common loading patterns, there are two important technical timestamps in Data Vault 2.0. The first is the load date timestamp (LDTS). This timestamp does not represent a business date that comes from the source system. Instead, it provides information about when the data was first loaded into the data warehouse, usually the staging area.

Therefore, it is completely different from the various business dates that come from the source systems including a business meaning. For this reason, it must be generated for a whole batch-loading process. Business dates, for example, validation dates, are stored in effectivity satellites, which are mostly found connected with link entities. They provide information about the relationship of business objects with begin and end date of a relationship.

The second technical timestamp is the load end date timestamp (LEDTS). Like the LDTS, the LEDTS is system-generated and occurs in satellite entities only. As those satellites are delta-driven, there is always one record that represents the most recent delta. The value of the LEDTS on those records is usually ‘9999-12-31’ (end of time) or NULL. The following figure shows the whole end-dating process that comes with the usage of the LEDTS attribute. It is executed after the loading process of the satellite (not in the loading process):

End dating satellites in Data Vault 2.0 for insert only architecture
Figure 1: End dating process for satellites

The figure shows that we have to update the satellite with the new LEDTS value which costs performance. As mentioned in the beginning we want to remove the LEDTS updates to get more performance with a 100% insert-only Data Vault 2.0 architecture.

At this point, a typical question is how to query the most recent delta in a satellite when we don’t have the LEDTS anymore. Using max(LDTS)? For sure not.

The advantage of PIT tables in Data Vault 2.0

The answer is to use window functions to load your point in time (PIT) tables. We covered the topic PIT tables with an example from the insurance industry in our newsletter from October 2018. The purpose of PIT tables is to improve the query performance by eliminating outer joins and allow inner joins with equi join conditions for performance reasons. We highly recommend building a PIT table as the better alternative to the LEDTS. The PIT table is built using window functions to find the most recent delta in the satellite. Once it is created with snapshots of the current data, we don’t have to query on the LDTS with BETWEEN conditions. The temporal history is stored as snapshots and can be queried with equi join conditions on the Hash Key and the LDTS to the related satellites. Due to the fact that the PIT tables grow, it is recommended to create partitions on the snapshot date. At the end, the (visualized) Information Mart dimensions can be easily queried directly from the PIT table and the related satellites.

By using window functions on the Hash Key (partition) and the LDTS (order) you can identify the most recent delta, which is dynamically calculated. There are some window functions that can be used for finding the most recent delta. The following table shows some examples for window functions.

Window function in Data Vault 2.0 for insert only architecture
Table 1: Examples of window functions

A reason for the existence of the LEDTS in Data Vault is that many databases in the early 21st century were not supporting window functions or were not fast enough.

As already mentioned in the previous newsletter of October 2018, the purpose of PIT tables is to allow inner joins with equi join conditions. But they are also the key to get to an insert-only implementation of Data Vault 2.0, which allows more efficient loading processes.

Conclusion

Implementing an insert-only architecture in Data Vault 2.0 enhances scalability and performance by eliminating update operations during data loading. This approach simplifies the loading process and ensures that all data changes are captured as new records, preserving historical accuracy. By utilizing Point-in-Time (PIT) tables, organizations can efficiently query the most recent data without relying on end-dating techniques, further streamlining data retrieval and analysis.

How to Use Point in Time Tables (PIT) in the Insurance Industry?

Introduction to point in time tables

Point in Time Tables

Point in time tables are useful when querying data from the Raw Vault that has multiple satellites on a hub or a link:

Introduction to point in time tables
Figure 1: Data Vault model including PIT (logical)

About Point In Time Tables Tables

In the above example, there are multiple satellites on the hub Customer and link included in the diagram. This is a very common situation for data warehouse solutions because they integrate data from multiple source systems. However, this situation increases the complexity when querying the data out of the Raw Data Vault. The problem arises because the changes to the business objects stored in the source systems don’t happen at the same time. Instead, a business object, such as a customer (an assured person), is updated in one of the many source systems at a given time, then updated in another system at another time, etc. Note that the Point-in-time table (PIT) is already attached to the hub, as indicated by the ribbon.

Changes came in at various times, not related to each other. Most updates would be added when insurance is concluded, but they did not affect all operational systems at the same time. As a consequence, the change did not affect all satellites. Instead, it affected only the satellite that was supposed to cover the change (which is an advantage).

When building a data mart from this raw data, querying the customer data on a given date becomes complicated: the query should return the customer data as it was active according to the data warehouse delta process on the selected date. It requires outer join queries with complex time range handling involved to achieve this goal. With more than three satellites on a hub or link, this becomes complicated and slow. The better approach is to use equal-join queries for retrieving the data from the Raw Data Vault. To achieve this, a special entity type is used in Data Vault 2.0 modeling: point in time tables (PIT). This entity is introduced to a Data Vault 2.0 model whenever the query performance is too low for a given hub or link and surrounding satellites.

Point in time tables
Figure 2: PIT table structure

Because the data in a PIT table is system-computed and does not originate from a source system, the data is not to be audited and not in the Raw Vault, so the structure can be modified to include computed columns.

Point in time tables serve two purposes:

Simplify the combination of multiple deltas at different “point in time”

A PIT table creates snapshots of data for dates specified by the data consumers upstream. For example, it is often usual to report the current state of data each day. To accommodate these requirements, the PIT table includes the date and time of the snapshot, in combination with the business key, as a unique key of the entity (a hashed key including these two attributes, named CustomerKey in Figure 2). For each of these combinations, the PIT table contains the load dates and the corresponding hash keys from each satellite that correspond best with the snapshot date.

Reduce the complexity of joins for performance reasons with point in time tables

The point in time table is like an index used by the query and provides information about the active satellite entries per snapshot date. The goal is to materialize as much of the join logic as possible and end up with an inner join with equi-join conditions only. This join type is the most performant version of joining on most (if not all) relational database servers. In order to maximize the performance of the PIT table while maintaining low storage requirements, only one ghost record is required in each satellite used by the point in time table. This ghost record is used when no record is active in the referenced satellite and serves as the unknown or NULL case. By using the ghost record, it is possible to avoid NULL checks in general, because the join condition will always point to an active record in the satellite table: either an actual record that is active at the given snapshot date or the ghost record.

Example of Point in time tables
Table 1: Example of PIT table

The table above (Table 1) shows an assured person with frozen data states, one from the 8th, and one from the 9th of October 2018. On the 8th there was no record for this customer in the legal expenses insurance satellite. For that reason both the hash key and the load date timestamp are NULL. For better query performance, these NULL values are pointed to the ghost record in the related satellite table to avoid searching for a record which not exist.

When customer data must be deleted for one business only and PII information is used as Business Key, just the Link entry and the descriptive attributes in the specific Satellite have to be deleted. The activity history is still available, can be used for analytical reasons, and is not traceable to the customer itself. The additional advantage of this “business split” is when only one business is affected in case of deleting customer data, i.e. each business comes from different subsidiaries, and only the car insurance data must be deleted. Furthermore, keep in mind that deleting the Business Key only (and keeping the Hash Key) does not result in GDPR compliance (and does not meet the Data Vault 2.0 standard anyway as the Business Key is used in link tables). The Hash Key in Data Vault 2.0 is not used to encrypt data but for performance reasons. The key in the Links and the business-driven Hubs, as we are talking about, can not be calculated back as it is a complete surrogate key. As soon as the customer wants to be deleted completely as he/she is no longer a customer in any of your business, you delete the record from the main Hub as well.

Otherwise, if there is no additional artificial key for the customer, after deleting PII data, you can not tie your data back to an object (an anchor point), which makes them (in many cases) useless.

Conclusion

The purposes of point in time tables are to improve the query performance by eliminating outer joins and allowing inner joins with equi join conditions (best performance). Additionally, point in time tables enhance partitioning and enable full scalability of star schemas (which should be completely virtualized) on top of the Data Vault. Furthermore, end users don’t have to join through all satellite tables, but join just one table for one business object which reduces the query complexity for ad-hoc queries.

Managed Self-Service BI: Success in Spite of Stringent Regulation

Managed Self Service BI
The latest story to find itself added to the growing number of successful implementations of Scalefree’s services, and Data Vault 2.0 as a whole, centers around a sector known for its strict regulatory bodies in addition to high volume of data that demands the utmost in terms of privacy and security.

As the events that unfolded during the various financial crises at the start of the century left governments the world over seeking to impose stricter regulations for the financial sector. Banks within the sector were faced with a new task as they sought to continue operating with expansion in mind while still falling well within defined standards. Continue Reading

The Latest Innovations of Data Vault 2.0

Data Vault 2.0 Training FAQ - Customized Class

Focus on trends: Data Lake and no-sql, dwh architecture, self-service bi, modeling and gdpr

In the past, we wrote about topics we were confronted with when we consult our clients or just recognized widely occurring discussions in the web.

All these topics were already covered in Data Vault 2.0 and most of them moved into a higher focus within the last months. Coming with the trends in the private sector, NoSQL databases are now playing an important role for storing data fast from different source systems. This brings new opportunities to analyze the data, but also new challenges, i.e. how to query fast from those “semi”- and “unstructured” data, e.g. including Massive Parallel Processing (MPP). Furthermore, there is an abundance of tools to store, transport, transform and analyze the data, what often results in time and cost-intensive researching.  The knowledge about “Schema on Write” and “Schema on Read” (and their differences) became very important to build a Data “Warehouse”. A Schema has been and is still mandatory for Business Analysts when they have to tie the data to business objects for analytical reasons. Storing your data in NoSQL platforms only (let’s call it a “Data Lake”) is a good approach to capture all your company’s data, but it became much more difficult for Business User to get the data out from those platforms. A good and recommended approach is to have both, a Data Lake AND a Data Warehouse combined in a Hybrid Architecture.

Continue Reading

How to Scale in a Disciplined Agile Manner?

Looking beyond Scrum and learn how to increase the value in Data Vault 2.0 projects

Earlier this year we talked about Managed Self-Service BI to explain how business users can take a benefit from this approach in Data Vault 2.0. Now we want to show you how to get there from a project management perspective, even in large companies where the standard Scrum approach often not works with the accorded deployment/release regulations and other approaches like the Disciplined Agile framework are the better fit.

Agile transformation is hard because cultural change is hard. It’s not one problem that needs to be solved, but a series of hundreds of decisions affecting lots of people over a long period of time that affects relationships, processes, and even the state of mind of those working within the change.

There are two fundamental visions about what it means to scale agile: Tailoring agile strategies to address the scaling challenges – such as geographic distribution, regulatory compliance, and large team size – faced by development teams and adopting agility across your organization. Both visions are important, but if you can’t successfully perform the former then there is little hope that you’ll be successful at the latter.

Continue Reading

Still Struggling with GDPR?

Hubs & GDPR

GDPR

The new General Data Protection Regulation (GDPR) is a law by the European Union (EU) and became effective on May 25, 2018. This new regulation is designed to put a high level of protection to personal data of European citizens, what means that companies around the world have to establish transparency and ownership to the individuals’ data and need to get a clear declaration of consent from them to save and process their personal data. Though laws from countries outside the EU (especially the USA) tend to favor business over consumer, GDPR affects all companies over the world who have personal data from EU-citizens in their database.

What is new in GDPR?

To be careful with personal data is nothing new, especially not in the EU. The key change of collecting and processing personal data is that the data is now completely under control of the owner, who can force the companies to delete or anonymize their data or to request copies of all owners personal data stored in the system. Personal data or Privately Identifiable Information (PII) means data, an individual can be identified with, e.g. name, phone number or email address. Continue Reading

Data Warehouse and Data Lake: Do We Still Need a Data Warehouse?

Managed Self Service BI

“Big Data”, “Data Lake”, “Data Swamp”, “Hybrid Architecture”, “NoSQL”, “Hadoop” … terms you are confronted with very often these days when you are dealing with data. Furthermore, the question comes up if you really need a data warehouse nowadays when you deal with a high variety and volume of data. We want to talk about what a data lake is, if we need a data warehouse when using NoSQL platforms like Hadoop, and how it is combined with Data Vault.

WHAT IS A DATA LAKE?

There is a proper definition from Tamara Dull (SAS): “A data lake is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data. The data structure and requirements are not defined until the data is needed.” 1 Continue Reading

How to Combine Managed Self-Service BI with Data Vault 2.0?

Managed Self-Service BI and Data Vault 2.0

Combining Managed Self-Service BI with Data Vault 2.0

This article explores how combining Managed Self-Service BI with Data Vault 2.0 enables organizations to balance data governance and agility, ensuring both control and flexibility in their analytics processes.Last month we talked about a hybrid architecture in Data Vault 2.0, where we explain how to combine structured and unstructured data with a hybrid architecture. To follow up on this topic, we now want to explain how your business users (especially power users) can take a benefit from it with the managed Self-Service Business Intelligence (mSSBI) approach in Data Vault 2.0.

About Self-Service BI

Self-service BI allows end-users to completely circumvent IT due to this unresponsiveness of IT. In this approach, business users are left on their own with the whole process of sourcing the data from operational systems, integration and consolidation of the raw data. There are many problems with this self-service approach without the involvement of IT:

In many cases, end-users – even if they are power users with the knowledge to SQL, MDX, and other techniques, don’t have the right tools available to solve the tasks. Instead, much work is done manually and error-prone. But from our experience, it is not possible to completely prevent such power users from obtaining data from source systems, preparing it, and eventually reporting the data to upper management. What organizations need is a compromise between IT agility and data management that allows power users to obtain the data they need quickly, in a usable quality. To overcome these problems, the Data Vault 2.0 standard allows experienced or advanced business users to perform their own data analysis tasks on the raw data of the data warehouse.

About the Managed Self-Service BI Approach

In fact, a Data Vault 2.0 powered IT welcomes business users to take the data that is available in the enterprise data warehouse (either in the Raw Data Vault or in the Business Vault) to create local information marts using specialized tools. These tools retrieve the data from the enterprise data warehouse, apply a set of user-defined business rules and present the output to the end-user. IT might also create structures where organizational-wide business rules are applied to provide a consolidated view on parts of the model or pre-calculate KPIs to ensure consistency among such calculations. Because both types of data (raw data and business rule applied data) is already integrated, the business user can also join consolidated data with raw data from specific source systems. This approach is called Managed Self-Service BI, where IT evolves to a service organization that provides those power users with the data they want, in the timeframe they need. The data is integrated by its business key and can be consolidated as well as quality checked.

Implement MSSI in the Data Vault 2.0 Architecture

The Data Vault 2.0 architecture provides self-service capabilities for power users in the organization:

Managed Self-Service BI and Data Vault 2.0
Figure1 : mSSBI Architecture

In this case, power users who build their own, custom solutions can write back data and information into the Enterprise Data Warehouse by leveraging a dedicated user space for this purpose. The write-back can then later be re-used in the solution for information delivery. Furthermore, the business users can manage their own master data by using an MDM application. It enables authorized business users to change the parameter values and therefore influence the results of the business rules.

The difference between “managed” Self-Service BI to the standard Self-Service BI approach from the general industry is that Data Vault 2.0 provides a managed environment where data and information are provided in a controlled and secure manner. Power users can only query the data they are allowed to see from a data security perspective.
Another advantage is that this approach enables organizations in security and banking industries to provide a fully auditable and traceable environment that meets the highest security requirements.

Managed Self-Service BI does require a write-back possibility in the enterprise data warehouse architecture, otherwise, it’s just plain old BI solution. Without write-back, there are no differentiators. Beyond that, write-back is necessary in order for enriching or enhancing the quality of the data being put forward. Data scientists, for example, do this all the time: when using Hadoop they create a new target file as a result of their processing output. This is a direct write-back in the Hadoop space. We have required write-back in Self Service BI for years, otherwise, master data, and hierarchy management don’t work properly.

In this modification of the architecture from the previous section, the relational staging area is replaced by a HDFS based staging area which captures all unstructured and structured data. While capturing structured data on the HDFS appears as overhead at first glance, this strategy actually reduces the burden of the source system by making sure that the source data is always being extracted, regardless of any structural changes. The data is then extracted using Apache Drill, Hive External or similar technologies. It is also possible to store the Raw Data Vault and the Business Vault (the structured data in the Data Vault model) on Hive Internal.

Conclusion

Combining Managed Self-Service BI with Data Vault 2.0 empowers organizations to strike a balance between governance and agility in their data ecosystems. By leveraging Data Vault’s structured, auditable architecture alongside self-service BI’s flexibility, businesses can ensure data accuracy, security, and scalability while enabling users to gain faster insights. This approach enables collaboration between IT and business users, driving more informed decision-making and accelerating data-driven innovation.

Hybrid Architecture in Data Vault 2.0

Data Vault 2.0 Hybrid Architecture

Hybrid Architecture in Data Vault 2.0

Business users expect their data warehouse systems to load and prepare more and more data, regarding the variety, volume, and velocity of data. Also, the workload that is put on typical data warehouse environments is increasing more and more, especially if the initial version of the warehouse has become a success with its first users. Therefore, scalability has multiple dimensions. Last month we talked about Satellites, which play an important role in scalability. Now we explain how to combine structured and unstructured data with a hybrid architecture.

Logical Data Vault 2.0 Architecture

The Data Vault 2.0 architecture is based on three layers: the staging area which collects the raw data from the source systems, the enterprise data warehouse layer, modeled as a Data Vault 2.0 model, and the information delivery layer with information marts as star schemas and other structures. The architecture supports both batch loading of source systems and real-time loading from the enterprise service bus (ESB) or any other service-oriented architecture (SOA).

The following diagram shows the most basic logical Data Vault 2.0 architecture:

Data Vault 2.0 Architecture
Figure 1: Logical Data Vault 2.0 Architecture

In this case, structured data from source systems is first loaded into the staging area to reduce the operational / performance burden from the operational source systems. It is then loaded unmodified into the Raw Data Vault which represents the Enterprise Data Warehouse layer. After the data has been loaded into this Data Vault model (with hubs, links, and satellites), business rules are applied in the Business Vault on top of the data in the Raw Data Vault. Once the business logic is applied, both, the Raw Data Vault and the Business Vault are joined and restructured into the business model for information delivery in the information marts. The business user is using dashboard applications (or reporting applications) to access the information in the information marts.

The architecture allows implementation of the business rules in the Business Vault using a mix of various technologies, such as SQL-based virtualization (typically using SQL views), and external tools, such as business rule management systems (BRMS).

However, it is also possible to integrate unstructured NoSQL database systems using a hybrid architecture. Due to the platform independence of Data Vault 2.0, NoSQL can be used for every data warehouse layer, including the stage area, the enterprise data warehouse layer, and information delivery. Therefore, the NoSQL database could be used as a staging area and load data into the relational Data Vault layer. However, it could also be integrated both ways with the Data Vault layer via a hashed business key. In this case, it would become a hybrid architecture solution and information marts would consume data from both environments.

Hybrid Architecture

The standard Data Vault 2.0 architecture in Figure 1 focuses on structured data. Because more and more enterprise data is semi-structured or unstructured, the recommended best practice for a new enterprise data warehouse is to use a hybrid architecture based on a Hadoop cluster, as shown in the next figure:

Data Vault 2.0 Hybrid Architecture
Figure 2: Hybrid Data Vault 2.0 Architecture

In this hybrid architecture modification, the relational staging area is replaced by a HDFS based staging area that captures all unstructured and structured data. While capturing structured data on the HDFS appears as overhead at first glance, this strategy actually reduces the burden of the source system by making sure that the source data is always being extracted, regardless of any structural changes. The data is then extracted using Apache Drill, Hive External, or similar technologies.

It is also possible to store the Raw Data Vault and the Business Vault (the structured data in the Data Vault model) on Hive Internal.

Conclusion

Integrating a hybrid architecture within Data Vault 2.0 enables organizations to effectively manage both structured and unstructured data by leveraging platforms like Hadoop. This approach enhances scalability and flexibility, allowing for efficient data processing and storage. By replacing traditional relational staging areas with HDFS-based systems, businesses can reduce the burden on source systems and ensure seamless data extraction

Visual Data Vault by Example: Satellites Modeling in the Health Care Industry

Data Vault 2.0 is a concept for data warehousing, invented by Dan Linstedt. It brings many new features that help anyone who is concerned with Business Intelligence entering a new age of data warehousing. Data Vault 2.0 is a Big Data concept that integrates relational data warehousing with unstructured data warehousing in real-time. It is an extensible data model where new data sources are easy to add. When our founders wrote the book, they required a visual approach to model the concepts of Data Vault in the book. For this purpose, they developed the graphical modeling language, which focuses on the logical aspects of Data Vault. The Microsoft Visio stencils and a detailed white paper are available on www.visualdatavault.com as a free download.

This year we already wrote about the modeling of hubs and links in Data Vault 2.0. Now, we want to introduce you the third standard entity, the Satellite.

SATELLITES IN VISUAL DATA VAULT

Satellites add descriptive data to hubs and links. Descriptive data is stored in attributes that are added to the satellite. The individual attributes are added to the satellite one at a time. A satellite might be attached to any hub or link. However, it is only possible to attach the satellite to one parent. Continue Reading

Close Menu