Focus on trends: Data Lake and no-sql, dwh architecture, self-service bi, modeling and gdpr
In the past, we wrote about topics we were confronted with when we consult our clients or just recognized widely occurring discussions in the web.
All these topics were already covered in Data Vault 2.0 and most of them moved into a higher focus within the last months. Coming with the trends in the private sector, NoSQL databases are now playing an important role for storing data fast from different source systems. This brings new opportunities to analyze the data, but also new challenges, i.e. how to query fast from those “semi”- and “unstructured” data, e.g. including Massive Parallel Processing (MPP). Furthermore, there is an abundance of tools to store, transport, transform and analyze the data, what often results in time and cost-intensive researching. The knowledge about “Schema on Write” and “Schema on Read” (and their differences) became very important to build a Data “Warehouse”. A Schema has been and is still mandatory for Business Analysts when they have to tie the data to business objects for analytical reasons. Storing your data in NoSQL platforms only (let’s call it a “Data Lake”) is a good approach to capture all your company’s data, but it became much more difficult for Business User to get the data out from those platforms. A good and recommended approach is to have both, a Data Lake AND a Data Warehouse combined in a Hybrid Architecture.
With the 25th of May 2018, a new regulation caused a lot of question marks over many heads. GDPR. What is changing exactly? What internal processes in marketing, sales, and general customer communication we have to refactor? Does it affect our company as well, because my company is in the United States and this is European regulation? … having in mind that the penalties are really high. The short answer is, that the customer has now the control over his own data in your company. Data Vault 2.0 delivers architecture, modeling, and implementation solutions how to handle delete requests of personal data through your Data Warehouse tiers very fast and stable whereby data lineage became very important.
Everything today has to be agile. Data Vault 2.0 has been and is still following those agile approaches from the beginning. Ask yourself the question for whom you are building a Data Warehouse, a Data Lake, or whatever has to do with data provisioning. It is for people who have to work with that data, what are your end user, e.g. business analysts and data scientists. They want to have access to their data in a comfortable, fast and structured way (for example a star schema). Starting a project with a fix release date (like building a physical warehouse) will most likely end up in separated own solutions in the meanwhile, far away from data governance, consolidations, inter-divisional interpretations of the data. Projects in an agile manner will form collaborative teams and focus on your business user’s needs. Additionally, Managed Self-Service BI will empower your end user by giving them access to the Enterprise Data Warehouse to build and provide their own solutions in a controlled manner.
GET THE DATA VAULT 2.0 UPGRADE
Discover the latest Data Vault 2.0 innovations in the Data Vault 2.0 Upgrade Class 2018 online. The special feature of the one-day training is the open discussion.
NEWLY STRUCTURED DATA VAULT 2.0 BOOT CAMP
The agenda of the Data Vault 2.0 Boot Camp was restructured and extended to dive deeper into up-to-date and future driven trends as explained above.
Besides a new hands-on workshop and a online training, the class now got an additional focus on topics such as NoSQL databases, how to query from including Massive Parallel Processing (MPP) and what the difference is between “Schema on Write” and “Schema on Read”. Schema was and is mandatory for Business Analysts when they have to tie the data on business objects for analytical reasons. Furthermore, the Business Vault ran more into the focus, the same for the part “Data as an Asset”. At the last day, the Data Warehouse is compared with a Data Lake … or better saying how to “combine” them.
The online training is packed in 8 hours of videos from Dan Linstedt where he introduces you to the Data Vault 2.0 System of Business Intelligence including the whole methodology. This computer-based training has to be passed before the on-site training starts. Because the agenda was extended, there are still 3 training days in-person, taught by an authorized trainer. The first day is then filled with new topics like Big Data and NoSQL databases, Massive Parallel Processing (MPP), the value of data as an asset, success stories, and real-world business cases, among other things.
In this hands-on workshop at the end of the second day, you will be confronted with real-world scenarios to put your learned modeling knowledge to a test. The goal is to model an enterprise data warehouse with Data Vault where data comes from several source systems. Decisions like “What is the Business Key”, “Reference Table or Hub (Business Object)”, “How to model transactional data”, and more have to make. You will be divided into teams and each team is given a set of guidelines, source system models, assumptions, and a workshop guide. You are responsible for assigning roles like modeler, note-taker, scrum master/sprint leader, etc. The following morning you and your team present an entity relationship diagram consisting of a consolidated Data Vault Model, your assumptions, reasoning statements and your questions.
The objective is to provide you with a hands-on team building experience. The end objective is to enhance your skill set so that you are a qualified practitioner/expert in enterprise Business Intelligence projects.
Visual Data Vault
To support the creation of Visual Data Vault drawings in Microsoft Visio, a stencil is implemented that can be used to draw Data Vault models. The stencil is available at www.visualdatavault.com.