Skip to main content
search
0
All Posts By

Building a scalable Data Platform?

Whether you're implementing Data Vault 2.1 or modernizing your analytics architecture, our experts help you turn complex data challenges into practical, future-proof solutions. From hands-on implementation to in-depth training, we support your team every step of the way.

AI Act Insight: Ensuring Responsible AI for Your Business

AI Act Business Intelligence Architecture graphic

AI Act

The Artificial Intelligence (AI) Act has been in force since August 1, 2024 and will gradually come into effect over time. As a new legal milestone, the AI Act brings with it requirements for the use of artificial intelligence in companies in order to promote the responsible development and use of artificial intelligence in the EU. But instead of seeing only hurdles in the risk-based approach of the AI-Act, it also opens up a wide range of opportunities for a future-oriented AI strategy for companies.

In our newsletter, we take a first glance at the new legal framework and its significance for your company. We also show how companies can use legal compliance strategically to gain a competitive edge and promote innovative business models.

AI Act Insight: Ensuring Responsible AI for Your Business

The EU’s AI Act is here! Learn how this groundbreaking regulation impacts your business. We’ll break down the risk-based approach to AI systems, focusing on high-risk applications and compliance requirements. Discover practical steps to ensure transparency and leverage tools like AI-Marts for effective AI governance.

Watch Webinar Recording

What is the AI Act and Why Should You Care?

The AI Act aims to make the use of artificial intelligence within the EU safer and more trustworthy by creating clear rules for the development and use of AI systems. The focus here is on the protection of fundamental rights, health, and safety. The legal framework is based on a risk-based approach: AI applications are divided into four different categories, from minimal to unacceptable risks, depending on the potential threat to society. Specifically, the AI Act provides for the following categories:

  1. Unacceptable Risk: AI systems that pose a threat to human rights or safety, such as those used for social scoring or manipulative practices, are prohibited.
  2. High Risk: These systems are heavily regulated and include applications in critical areas such as biometric identification, healthcare, transportation, education, and employment. Businesses using high-risk AI must meet strict compliance standards.
  3. Limited Risk: These systems face fewer restrictions but must still adhere to transparency requirements. For example, chatbots need to inform users that they are interacting with AI.
  4. Minimal or No Risk: The least regulated category includes AI applications such as spam filters or AI-driven video games.

The AI Act is of great importance for companies, as the new requirements not only entail compliance obligations, but also open up opportunities to gain competitive advantages.

By adapting to the legal requirements at an early stage, companies can strengthen trust among customers and partners, minimize risks, and promote innovation responsibly. A sound understanding of regulation enables them to make strategic use of the legal framework and position themselves better in international comparison.

Hence, for businesses operating in or with the EU, compliance with the AI Act will be a decisive factor. Failure to comply could result in significant penalties—up to 7% of the global annual turnover or €35 million, whichever is higher.

As the legislation moves gradually forward, it is recommended that companies, as a first step, review their AI tools and analyze how these systems are classified and regulated under the new framework to implement the necessary obligations.

Comply With the AI Act Today!

Preparing for the AI Act requires a proactive and comprehensive approach. Ensure compliance, mitigate risks, and foster trust in your AI applications.

Get My Free Checklist

From a Data Warehousing perspective: How can an AI-Mart help?

As businesses prepare to comply with the European Union’s AI Act, ensuring that their data and AI systems meet the new regulations is critical. Central to this is the concept of data governance and traceability, especially for AI models classified as high-risk. A modern data warehouse (DWH), particularly one powered by Data Vault 2.0, when combined with a specialized AI-Mart, can provide the technical foundation needed for compliance by managing the data lifecycle, ensuring transparent operations, and logging AI model activities. Data Vault 2.0 offers several advantages for this purpose, including its ability to support agile development, enabling rapid changes in business requirements; ensuring scalability, allowing businesses to handle increasing data volumes seamlessly; and providing strong historical tracking through its architecture, which facilitates easier auditing and compliance verification.

In the context of AI, the AI-Mart is a specialized data mart within a DWH, focused solely on managing AI training data. Its purpose is to provide a structured and compliant environment for storing and curating datasets that will be used to train, validate, and test AI models. Unlike a traditional data mart, the AI-Mart is designed with features tailored for AI, such as enhanced metadata, tracking, and model training documentation.

Key Features of an AI-Mart

  1. Data Curation for AI Training: The AI-Mart stores data specifically curated for training AI models, ensuring that all datasets are clean, unbiased, and high-quality. Built-in data governance rules ensure that only validated data enters the mart. This ensures compliance with the AI Act’s requirements for high-risk AI systems, where data must be trustworthy, accurate, and free of bias.
  2. Metadata and Documentation: The AI-Mart stores metadata about each dataset, including its source, transformations applied, and its use in specific AI models. This metadata is essential for traceability, ensuring that every data point used in an AI model can be traced back to its origin and all changes can be documented.
  3. Data Versioning and Lineage: In AI applications, ensuring that models use up-to-date and reliable data is critical. The AI-Mart supports data versioning, allowing teams to maintain multiple versions of datasets and trace changes over time. Data lineage tracking ensures that the lifecycle of the data—from ingestion to usage in AI models—can be fully traced, providing a comprehensive audit trail required for compliance with the AI Act.

This is why a robust data governance framework is crucial for ensuring compliance with the AI Act. By integrating a data warehouse (DWH) with an AI-Mart, businesses can implement stringent governance measures that ensure the quality and reliability of AI training data. For example, automated validation pipelines within the DWH verify that only data meeting predefined quality standards is used, minimizing errors, biases, and missing information. This is particularly important for high-risk AI applications, such as those in biometric identification or healthcare, where poor data quality could lead to harmful or inaccurate outcomes.

To comply with the AI Act, businesses must ensure traceability in their AI systems by tracking and documenting key stages of the AI process, from data preparation to model usage. Integrating AI model logs into a data warehouse (DWH) plays a crucial role in this, providing a centralized system to monitor and store critical information about how AI models operate and interact with data.

AI Act Business Intelligence Architecture graphic

Logging AI Decisions and Outputs: Each time an AI model processes data, logs should be automatically generated and stored in the DWH. These logs capture essential details, including input data, feature transformations, model parameters, decision thresholds, and output probabilities. By loading these logs into the DWH, businesses create a detailed audit trail of AI activity, ensuring that key aspects of the model’s operations are documented.

Log Aggregation and Storage: Logs from AI models, whether during training or production, can be continuously fed into the DWH as part of the AI-Mart infrastructure. These logs may include:

  • Model training logs: Documenting how the model was trained, the datasets used, and the parameters adjusted during training.
  • Model inference logs: Recording the input data, features generated, and each prediction made by the model.
  • Performance metrics: Storing evaluations like accuracy, precision, and recall, which help track the model’s performance over time and detect any model drift.

By storing these logs in the DWH, businesses can establish detailed records of AI model operations for regulatory purposes.

Querying and Auditing Logs: The DWH’s querying tools allow compliance teams to generate reports that show how models operate, what data was used, and how the AI model has evolved. This simplifies the process of responding to regulatory audits and demonstrates adherence to the AI Act’s requirements.

By combining a DWH with an AI-Mart for AI training data and loading AI model logs into the same infrastructure, businesses can build a comprehensive framework for compliance with the AI Act. This approach supports data governance, ensuring high-quality data for AI models, and ensures traceability, allowing businesses to track and audit every aspect of their AI systems. This not only meets regulatory requirements but also fosters trust and accountability in the use of AI technology.

Upcoming Resources and Events

For more information, contact our team at [email protected].

Final Remarks from the Authors

The AI Act should not only be seen as a regulatory challenge but also as an opportunity for businesses to differentiate themselves by adopting trustworthy and ethical AI practices. As AI continues to evolve, businesses that prioritize compliance, transparency, and human oversight will be better positioned to thrive in the coming years. By taking proactive steps now to ensure compliance, businesses can turn AI regulation into a strategic advantage, building trust with customers, partners, and regulators alike.

Utilizing Potentials of Data Vault 2.0 – Overcoming Bad Practices – Part 2

Watch the Webinar

What are common mistakes when applying Data Vault 2.0 in enterprise data warehouse projects? Do you have questions regarding modeling in Data Vault and the realization of GDPR causes you great difficulties or is your project stuck because you are delivering no business value?

This webinar describes common Anti-patterns of Data Vault, their consequences, and the solution to eliminate them from your current or in your future projects.

Tune in and learn more to avoid bad practices and apply simple solutions.

Watch Webinar Recording

Webinar Agenda

1. How to use Data Vault for modeling business information
2. How to avoid the pitfalls of being unable to deliver business value
3. How to mask Business Keys from Hubs for privacy

Implementing GDPR in Data Warehousing

Solutions

Implementing GDPR

In the realm of data warehousing, whether it be Data Vault 2.0 or traditional approaches like Kimball and Inmon, data is stored and processed across multiple layers. The intricacies of privacy, particularly the application of security measures and the concept of the “right to be forgotten,” permeate every layer housing personal data.

For privacy implementation, the primary objective is the removal of Personally Identifiable Information (PII) data from each layer. This meticulous process aims to extract PII data, leaving non-PII data intact. In the ideal scenario, this ensures a reduction in consumer data proportionate to the removed PII data.

The General Data Protection Regulation (GDPR) casts a significant influence on data warehouse projects, introducing stringent requirements for data processing and storage. This impact spans across security considerations, determining who has access to what data, and privacy mandates, addressing the right to be forgotten.

ACCESS THE SOLUTION

Salesforce Meets Data Vault

Salesforce and Data Vault - decoupling

It’s a Match!

Data integration with Salesforce can be tricky and needs a system of business intelligence to handle the complexity. Data Vault is capable of decoupling all the necessary business-driven changes, extensions and customizations to the platform while maintaining the ability to become the cornerstone of an integrated architecture. The decoupling is a part of our Data Vault Boot Camp and is summarized in Figure 1. Scalefree can provide knowledge and implementation assistance in both Data Vault as well as Salesforce therefore creating the optimal partner for your Salesforce integration project.

Salesforce and Data Vault - decoupling

Figure 1. Data Vault Decoupling

Agile Integration of Salesforce

Salesforce is optimized for transactions and not for analytics, in fact this is one reason we want to integrate it. More likely than not your Salesforce system is not “just a CRM” anymore. Over the past decade, Salesforce has evolved into a general purpose business application platform and maintains many levels of functionalities if one chooses to utilize them.

Salesforce can deliver a Sandbox for your own developers and third parties to develop any application they want. In fact, our customers often go on to create a variety of application add-ons and customizations that are made within Salesforce. This means that your integration will become more complex over time as more elements are added to the “one” source system. For this reason, it fits very well with Data Vault.

We can extend the vault,  as technical integration and business needs are decoupled by the very idea of Data Vault. This is where all the standards that you used to build your Vault come into play and save your day. Now you can  leverage the benefits of having an extensible and agile data warehouse.

In our recent webinar Salesforce and Data Vault we discussed some of the change drivers around Salesforce, which are summarized in Figure 2. We also talked about how those change drivers can be defused by using Data Vault.

Salesforce and Data Vault

Figure 2. Salesforce Change Drivers

In addition to the Data Vault methodology, project Roles like the Domain experts can help with the communication between the source system operations folks and the data warehousing team. For more information on the roles, study Disciplined Agile Delivery (DAD) by Scott Ambler, which is now also a part of the Project management institute (PMI).

Of course, we have just touched the surface here as there are many topics we have not talked about yet. For example, some Salesforce related challenges can be either solved with an expensive workaround in the Data Warehouse or with some simple adjustments in Salesforce. 

Also to be touched upon at a later time, how to deal with Salesforce limits like API calls or operational reporting limitations.

What topics are you interested in? 

What challenges are you facing right now?

Conclusion

We have seen that the integration of Salesforce can be handled with Data Vault as both systems fit together quite well. Data Vault adds the agility your data warehouse requires within those changing and complex source systems that are needed to provide the highest possible business value to your organization while saving you re- engineering cost in the long run. In this way, you can create your own sustainable Salesforce data pipeline.

Advantages for Virtualization in the Data Vault

Solutions

Virtualization in the Data Vault

In legacy or traditional data warehousing, a common strategy involves materializing data marts, also known as information marts, to enhance performance. However, this approach comes with a notable disadvantage – an increase in storage requirements within traditional data warehousing systems.

Materializing data marts can offer performance benefits, but the trade-off is a higher demand for storage space. This approach has been traditionally employed to optimize query response times and facilitate efficient data access

ACCESS THE SOLUTION

Difference Between Data Vault, Inmon and Kimball Approach

Solutions

Data Vault, Inmon and Kimball

Data Vault 2.0 stands on a robust foundation of four pillars, each shaping its distinct architecture. The Methodology pillar guides the project lifecycle, ensuring standardization. Architecture defines the blueprint, prioritizing scalability. Modeling introduces agile techniques, enhancing adaptability. Implementation brings the design to life, addressing practical considerations.

The Inmon approach to building a data warehouse begins with the corporate data model. This model identifies the key subject areas, and most importantly, the key entities the business operates with. From this model, a detailed logical model is created for each major entity.

The Kimball approach to building the data warehouse starts with identifying the key business processes and the key business questions that the data warehouse needs to answer. The key sources (operational systems) of data for the data warehouse are analyzed and documented.

ACCESS THE SOLUTION

Batch Loading Strategies for Data Vault 2.0

Solutions

Loading Strategies

In the realm of general data warehousing, various loading strategies come into play. One prevalent challenge often encountered is the absence of deleted records within a delta. In typical data warehousing scenarios, it becomes crucial to recognize and track deletions from the source system, often referred to as soft deletes.

The distinction lies in the need to not only capture new or modified data (delta) but also to account for records that have been deleted at the source. Soft deletes involve marking records as deleted rather than physically removing them, allowing for a more nuanced and traceable approach to data management.

ACCESS THE SOLUTION

Data Lake Efficiency: Structural Solutions

Data Lake architecture

Data Lake Structure – Solution

The organization of data within a data lake can significantly impact downstream accessibility. While offloading data into the data lake is a straightforward process, the real challenge arises in efficiently retrieving this data. The efficiency of data retrieval becomes crucial for tasks such as the incremental or initial Enterprise Data Warehouse (EDW) load and for data science practitioners conducting independent queries. In practice, the ease of accessing data downstream depends on how well the data is organized within the data lake. A well-organized structure facilitates smoother retrieval processes, empowering both EDW loads and the independent querying needs of data scientists.

ACCESS THE SOLUTION
Within a hybrid data warehouse architecture, as promoted in the Data Vault 2.0 Boot Camp training, a data lake is used as a replacement for a relational staging area. Thus, to take full advantage of this architecture, the data lake is best organized in a way that allows efficient access within a persistent staging area pattern and better data virtualization.

Continue Reading

Requirements and Templates for Hashing

Solutions

REQUIREMENTS FOR HASHING

Traditional data warehouses often use sequence numbers to identify records in other tables.

By using sequences, this method comes with some drawbacks. One of the biggest drawbacks is performance. Since the sequence numbers are generated by a generator, this step presents a bottleneck. In addition sequence numbers are generated in the data warehouse instead of loading them before.

This solution provides a template and the requirements round about hashing.

ACCESS THE SOLUTION

Data Security Concepts in Data Vault 2.0

Solutions

Data Security Concepts

The focal point of our discussion revolves around critical aspects such as security controls, access controls, and the definition of identities. The primary objective of this solution is to safeguard data assets effectively. The approach taken is typically centered on securing data at both the row/document level and the attribute level.

In terms of security controls, the emphasis is on implementing measures that ensure the confidentiality, integrity, and availability of data. Access controls play a pivotal role in governing who can interact with specific data assets, limiting access to authorized individuals or roles. Defining identities involves establishing clear parameters for users and entities accessing the data, contributing to a robust security framework.

In summary, this Data Vault solution prioritizes a comprehensive approach to data security, addressing concerns at different levels to fortify the protection of valuable data assets.

ACCESS THE SOLUTION

Exemplary Naming Conventions in Data Vault 2.0

Solutions

Naming Conventions

Data Vault modeling is a powerful approach that introduces a multitude of entities to the database. To enhance usability and facilitate effective development, it is highly advisable to implement clear and consistent naming conventions. These conventions play a vital role in grouping entities by concept and conveying crucial information to developers, including the data source, rate of change, privacy levels, security considerations, and more.

Introducing a well-thought-out naming convention not only simplifies the development process but also contributes to a more organized and comprehensible database structure. It acts as a guide for developers, offering insights into the nature and characteristics of each entity.

Conversely, the absence of a naming convention poses challenges in identifying related tables within Data Vault models. This lack of structure can lead to confusion, making it harder for developers to discern the relationships and purpose of different entities.

In conclusion, the implementation of naming conventions is fundamental for the success of Data Vault modeling solutions. It promotes clarity, efficiency, and a systematic approach to database development.

ACCESS THE SOLUTION

Test Strategies for Data Vault 2.0 based EDW

Solutions

Test Strategies

Testing is very important for data warehouse systems to make them work correctly and efficiently. In unit testing, each component is separately tested.

By testing business logic using unit tests, there is an issue with available tools for unit testing in data warehouses.

This solution describes test strategies for enterprise data warehouse solutions based on Data Vault 2.0.

ACCESS THE SOLUTION
Close Menu