Business Analyst and Data Modeler Collaboration in Data Vault Projects
One of the most common sources of friction in Data Vault projects isn’t technical — it’s organizational. The collaboration between Business Analysts and Data Modelers is arguably the most important working relationship in the entire delivery chain, yet it’s also one of the least clearly defined. Who does what? Where does one role end and the other begin? What information needs to change hands, and in what format? This post walks through a practical approach to structuring that collaboration, drawn from real project experience.
In this article:
- Why the Business Analyst and Data Modeler Collaboration Is So Critical
- Forget the Line — Work Together
- Starting with Concept Classification
- The Collaboration Spreadsheet: Simple and Effective
- From Spreadsheet to Automation Tool Metadata
- Where AI Is Starting to Help
- The Information Requirement: Starting from the End
- Making Collaboration Work in Practice
- Watch the Video
Why the Business Analyst and Data Modeler Collaboration Is So Critical
When Business Analysts and Data Modelers don’t collaborate effectively, the symptoms show up in the Raw Data Vault. Surrogate keys get nominated as Business Keys. Source system logic bleeds into what should be a raw, business-concept-driven model. Gaps in the information provided to modelers lead to design decisions based on assumptions rather than actual business understanding.
It’s worth clarifying one important point here: the Raw Data Vault is not where business perspectives live. Business logic, business rules, and the way the organization interprets its data — all of that belongs in the Business Vault. The Raw Data Vault should reflect the raw data as it comes from the source, structured around business concepts and Business Keys. Keeping that distinction clear is fundamental to a healthy collaboration between the two roles.
Forget the Line — Work Together
A common instinct is to draw a clean boundary: the Business Analyst works until a certain point, then hands off to the Data Modeler. In practice, this handoff model is where projects run into trouble. Information gets lost in translation. The Data Modeler receives documentation that makes sense from a business perspective but leaves key modeling questions unanswered. The Business Analyst doesn’t know what the Data Modeler actually needs.
A better approach: put everyone in the same room. Business Analysts, Data Modelers, Data Engineers, and dashboard designers all working toward the same deliverable — a report, a KPI, a business process automation. The business user doesn’t care about Data Vault; they care about the output. Build toward that output together.
This doesn’t mean everyone needs to be available full-time. But especially at the start of a project, physical or virtual co-location matters. When the Data Modeler hits a question the Business Analyst’s documentation doesn’t answer, the answer needs to be one conversation away — not a ticket in a queue.
Two additional roles are particularly valuable to have accessible during this phase: a source system specialist who knows the source data structure deeply, and a business user who can validate what’s being built against actual reporting needs. They’re typically time-constrained, so plan interactions with them carefully and make the most of the time you have.
The Data Vault Handbook:
Core Concepts and Modern Applications
Build Your Path to a Scalable and Resilient Data Platform
The Data Vault Handbook is an accessible introduction to Data Vault. Designed for data practitioners, this guide provides a clear and cohesive overview of Data Vault principles.
Starting with Concept Classification
Before diving into source tables and column mappings, it pays to start at a higher level. A concept classification session — sometimes called a concept analysis — asks a deceptively simple question: what is your business model?
In a meeting with stakeholders from different departments, you map out the core business objects: customers, products, purchases, factories, whatever is central to how the business operates. You’re not focused on relationships at this stage — you’re building a vocabulary. A taxonomy of the concepts that matter to the business.
The second part of this conversation — often in the same meeting or the next one — asks: how do you identify each of these concepts? This is where it gets interesting. If you have people from finance, production, and sales in the room, you’ll typically get different answers. Finance uses an Oracle ID. Sales uses a Salesforce account key. Production uses an SAP number. Different systems, different keys, all referring to the same underlying concept.
This gives you a set of Business Key candidates. From there, you can examine the actual source data: do these keys exist in the dataset? Are they unique? Do any of them appear across multiple source systems in a way that could serve as a shared integration key? That analysis — even if limited to the data you have in front of you — is enough to identify a strong candidate and move forward. It won’t be perfect. A full analysis of every source system across the enterprise is rarely funded. But a well-reasoned candidate key is enough to start building, and it can be refined as the project progresses.
The Collaboration Spreadsheet: Simple and Effective
Once you’ve identified your concepts and Business Key candidates, the next step is mapping source tables to those concepts and classifying every column. The tool for this doesn’t need to be sophisticated — a spreadsheet works well, and works well precisely because everyone can use it.
The process looks like this: before the meeting, a developer imports the source system metadata into the sheet — column names, data types, lengths, source table. One row per column. Then, in the meeting with the business user and source system specialist, you go through each column and answer a simple question: what is this?
The annotations don’t need to be elaborate. Common classifications include:
- Business Key — the identified key for this concept
- Descriptive attribute — goes into a Satellite
- Link reference — indicates a relationship to another Hub, requires a Link
- Surrogate Key — captured as descriptive, not used as the Business Key
- Ignore — not needed for this model
Additional classification dimensions — rate of change, security classification, privacy flags — can be added as columns in the same sheet. Satellite split decisions (which attributes group together into which Satellite) can be noted in comments. The goal is to give the developer enough context to build the metadata for the automation tool without needing another round of meetings.
The key discipline here is consistency. Keep comments patternized. The same type of note should look the same every time. A free-form comment field is useful; a completely unstructured one becomes noise.
From Spreadsheet to Automation Tool Metadata
Once the spreadsheet is complete, the developer translates it into the metadata format required by the automation tool — whether that’s Data Vault Builder, VaultSpeed, Datavault4dbt, or another platform. This translation step takes time and precision: automation tools produce exactly what their metadata specifies. Bad metadata produces bad results. But with a well-annotated spreadsheet as the source, the developer has a clear reference and can resolve most questions independently.
Some projects also require terminology translation at this stage. Source systems — especially SAP — often use abbreviated, language-specific field names that don’t belong in a Data Vault intended for a broader audience. The spreadsheet can include an English translation column, which the business user or source system specialist can complete asynchronously, keeping the meeting time focused on classification rather than translation.
The Data Vault Handbook:
Core Concepts and Modern Applications
Build Your Path to a Scalable and Resilient Data Platform
The Data Vault Handbook is an accessible introduction to Data Vault. Designed for data practitioners, this guide provides a clear and cohesive overview of Data Vault principles.
Where AI Is Starting to Help
The concept classification and Business Key identification process described above is time-intensive, and it’s largely limited by how much source system analysis you can afford to fund. This is one area where AI tooling is beginning to make a difference.
Tools like FLOW.BI — developed at Scalefree — can attach to source systems, profile the data automatically, classify attributes, and identify Business Key candidates that appear across multiple systems as potential shared integration keys. The manual process described in this post becomes a validation and refinement step rather than a ground-up analysis. The fundamentals are the same; the speed is different.
The Information Requirement: Starting from the End
One final principle worth emphasizing: start with the target. Before analyzing source systems, ask what needs to be produced. What KPI needs to be calculated? What report needs to be built? What data does that require, and where does it come from?
An information requirement document — a structured template that captures what the business user wants, what they need, and where the data lives — is the ideal starting point for any new delivery. It won’t always be complete. Business users often know what they want but not where the data comes from. That’s fine. The Business Analyst and Data Modeler work together to fill in the gaps. But having even a partial information requirement is better than starting from raw source tables and working backwards.
Scalefree has published a template for information requirements on their blog — searching for “information requirement Scalefree” will bring it up — which can serve as a starting point for teams building this practice.
Making Collaboration Work in Practice
There’s no single formula for Business Analyst and Data Modeler collaboration that works across every project and every team. But a few principles hold consistently: work toward the same deliverable together, use simple tools that everyone can engage with, start from the business concept before diving into source data, and keep the meeting time focused on decisions — not documentation.
The spreadsheet approach is unglamorous. It’s also fast, inclusive, and produces the output the developer actually needs. Sometimes the best collaboration tool is the one everybody already knows how to use.
To learn more about Data Vault modeling practices, Business Key identification, and the full Raw and Business Vault methodology, explore our Data Vault 2.1 Training & Certification. And for a concise introduction to the core concepts, the free Data Vault handbook is available as a physical copy or digital download.
The Data Vault Handbook:
Core Concepts and Modern Applications
Build Your Path to a Scalable and Resilient Data Platform
The Data Vault Handbook is an accessible introduction to Data Vault. Designed for data practitioners, this guide provides a clear and cohesive overview of Data Vault principles.