Data Vault 2.0 concept consisting of an extensible, fully auditable data method where new data sources are easy to add and the system scaled up endlessly.
With the help of the Agile methodology based on Scrum for a rapid build-out of the data warehouse concept can be achieved.
The Architecture not only supports structured data sources, but also semi-structured and unstructured such as Hadoop and real-time systems.
Standardized loading patterns optimize loading performance and support ETL patterns that can load massive amounts of data in a short period of time.
Industries profiting from Data Vault 2.0
implemented by Scalefree
Typically, global brands have already been using some kind of data warehouse for a long time. These companies, like automotive brands, usually look into Data Vault 2.0 because the amount of data they store and access is so large that the data warehouse can not handle it properly any more. Sensors of millions of cars transmit their data to the cloud which adds up quickly.
In this case, the company implemented Data Vault 2.0 and used PIT and Bridge tables to increase the query speed of the data. This is recommended since it severely reduces the time needed to access and analyze data saving costs.
Data Vault 2.0 is also used by a large German Insurance corporation. Linking, for example, customer data saves time on retyping standard information when a new claim is added, or coverage changes.
Other important aspects for insurance corporations are data privacy and Code of Conduct. With Data Vault 2.0, attribute data can independently be deleted from the data warehouse for privacy reasons.
Data virtualization in Data Vault 2.0 was implemented by the insurance company, which significantly saves space. Also, a framework for incoming data was created to automate the input of data to again save time wherever possible.
Business to Business
Business to business companies also benefit from using the Data Vault 2.0 methodology. A small team of around 5 people in B2B decided to implement Data Vault 2.0 to be ready for the Big Data needs of the near future. Very large amounts of data need to be transferred from source systems to the data warehouse in the near future, which is something their current data warehouse models can not handle.
Additionally, the company also wants to analyze unstructured data like videos and emails. This requires a lot of space in a data warehouse. The benefits of the complete Data Vault 2.0 concept are parallel loading of structured and unstructured data, and support of NoSQL-platforms. This is something conventional data warehouse models and architectures can not do.
Proof of concept
Test-drive Data Vault 2.0
These case studies illustrate, to some extent, the reasons for implementing Data Vault 2.0. Developments in Big Data – like the sheer amount, unstructured nature and increased importance of data analysis – requires a system to handle and analyze data fast. Scalefree sets up that complete concept with Data Vault 2.0, which saves you a lot of valuable time and money on the middle and long term. To test the Data Vault 2.0 methodology, we offer a proof of concept in which we implement the data warehouse method in a part of your company. Once the benefits have proven themselves, you can decide to implement Data Vault 2.0 in the entire company.
The department results are available for relatively low cost, while giving you all the information you need to decide on company-wide implementation of Data Vault 2.0. As part of the proof of concept, we offer on-site consulting and share our knowledge with your employees. This way the foundations of a self-sustaining data warehouse concept are readily available, allowing your employees to educate others in the company. We also provide on-site evaluation of the proof of concept and provide long term and reliable local support.
You have more questions?
The ones we hear a lot.
How to implement Data Vault 2.0?
We’re not just modeling a data warehouse. We’re building one. Building an effective data warehouse and business intelligence solution requires more than just a model. The goals of the complete data warehouse concept include getting all the data into the system as fast as possible, fully auditable and in parallel; then, enabling you to derive valuable information from the data. The data model is a key component for achieving these goals. The goal is to build an enterprise data warehouse concept capable of sourcing data from all organizational sources, both internal and external.
For that reason, we developed the Data Vault 2.0 Method of Business Intelligence consisting of the following components:
- Data Vault 2.0 concept consisting of an extensible, fully auditable data method where new data sources are easy to add;
- Agile methodology based on Scrum for a rapid build-out of the data warehouse concept;
- Architecture that not only supports structured data sources, but also semi-structured and unstructured such as Hadoop and real-time systems;
- Standardized loading patterns optimize loading performance and support ETL patterns that can load massive amounts of data in a short period of time.
Switching from Data Vault 1.0 or another data warehouse to Data Vault 2.0 is an investment in the future. Clients do not only save considerable costs on the long run, but also process data that was hard to integrate into the data warehouse before. Structured, semi-structured and unstructured data is delivered in batches over night or real-time (using Apache Storm and Apache Spark or similar technologies).
The enterprise data warehouse is not limited to your relational on-site data warehouse infrastructure but can be extended to the Cloud and NoSQL environments (such as Hadoop). That way it also supports Internet of Things networks with millions of sensors. All components of the Data Vault 2.0 System of Business Intelligence are designed scale-free.
some more questions
Why Big Data?
Many organizations today struggle with ever-increasing volumes of data, in varying structures, at high velocity. Our experienced and certified consultants help you understand and meet these challenges. Data Vault 2.0 helps you extract value from this type of data, or even allow you to build a business model around Big Data. In any case, using Big Data for corporate success is made easy with the Data Vault 2.0 System of Business Intelligence.
Data Vault 2.0 has been developed for the U.S. government and has proven to handle large amounts of data up to 3 petabyte (PB). Data Vault 2.0 supports cloud computing and real-time data, and is designed for massively parallel processing (MPP) architectures.
Integrating structured data in a relational data warehouse with semi- or unstructured data on a NoSQL environment, such as Apache Hadoop or MongoDB, is easy with Data Vault 2.0. It is also possible to store a full Data Vault 2.0 system on a Hadoop Cluster.
The challenge when analyzing data is so-called predictive analytics; to have a predictive view of the future, and not just witness “what happened in the past” and “happening at the moment”. Data Vault 2.0 includes Big Data smart algorithms (as part of data mining) that learn efficiently to make precise predictions. These algorithms follow a simple thesis: the more data they receive to learn from, the more precisely they predict what will happen in the future.
What is Agile Methodology?
Co-founder Dan Linstedt invented Data Vault 2.0 for the U.S. government. The task at hand was to create a method to load data from all internal source systems, extract useful information, and deliver it to information consumers with varying requirements. The end result? A data warehouse with more than 3 PB (or 255 trillion document pages) of data available to business analysts.
Implementing Data Vault 2.0 successfully on such a scale can not be done overnight. Sourcing massive amounts of data requires steady effort. To avoid any unwanted surprises, we apply an agile methodology based on the frequent delivery of new information to information consumers. Additionally, we extend the data warehouse with new functions (such as reports) in iterations. This helps our clients to redirect the combined efforts when new requirements arise from their daily business.
Key to our agile methodology is a transparent project progress, a focus on deliverable information artifacts, and the ability to introduce new requirements – all while adhering to proven organizational standards. We empower our clients in understanding these concepts and applying them successfully into their projects. Our consultants help align the whole organization in an agile fashion, setting up for success of the data warehouse.
How are databases with different architectures connected?
The integration of data on multiple platforms is based on industry standard hash keys. This means that, for example, any relational database with NoSQL environments can be connected with systems on-site or in the cloud, even when they run in different network segments or separate physical networks.
The hash keys are used to identify individual records in the Data Vault 2.0 and documents on NoSQL environments across platforms.
What are common mistakes when implementing Data Vault 2.0?
Data Vault 2.0 has been used in a variety governmental projects and commercial industries. It has been implemented in projects for companies in fields ranging from automotive and finance to security and telecommunications.
For implementing Data Vault 2.0 it is important to hire a specialist specifically trained in Data Vault 2.0 instead of other data warehousing models. We have come across faulty implementation of Data Vault 2.0 by generic data warehouse specialists. This is the main reason for us to found Scalefree; to provide a consistent application of Data Vault 2.0.
We offer our knowledge and deliver projects independently or in cooperation with other consulting companies and vendors. Preparing companies for the Big Data future is our main goal. Dan Linstedt, the inventor of Data Vault, works with us exclusively in Europe and lends his knowledge to all our consultants as co-founder of Scalefree.
The reference for Data Vault 2.0
“Today, the world is trying to create and educate data scientists because of the phenomenon of Big Data. And everyone is looking deeply into this technology. But no one is looking at the larger architectural picture of how Big Data needs to fit within the existing systems (data warehousing systems). Data Architecture a Primer for the Data Scientist addresses the larger architectural picture of how Big Data fits with the existing information infrastructure, an essential topic for the data scientist.”
“The Data Vault was invented by Dan Linstedt at the U.S. Department of Defense, and the standard has been successfully applied to data warehousing projects at organizations of different sizes, from small to large-size corporations. Due to its simplified design, which is adapted from nature, the Data Vault 2.0 standard helps prevent typical data warehousing failures.”