Understanding the Nature of Address Data
In many systems, address data doesn’t come in a uniform format. Some systems may embed it directly in a contact or customer table—think “billing address” and “shipping address” fields in Salesforce—while others provide a separate table for addresses and even a relationship table showing links between addresses and business entities.
Let’s look at how to tackle both of these situations using Data Vault best practices.
In this article:
- Case 1: Addresses as Attributes Inside Another Table
- Case 2: Addresses in a Separate Table
- Tracking Changes in Reference Data
- Handling Many-to-Many Relationships Between Addresses and Entities
- Resolving Ambiguity with Conditional Relationships
- Guiding Principles for Modeling Address Data
- Conclusion
- Watch the Video
- Meet the Speaker
Case 1: Addresses as Attributes Inside Another Table
If your source delivers addresses as part of another table (e.g., contact data with billing and shipping fields), the Raw Data Vault should model the data exactly as it comes. For example:
- Create a Contact Hub with a business key for contacts.
- Attach a Satellite containing billing and shipping address fields like city, street, and ZIP code.
Even if multiple contacts share the same address, duplication is acceptable in the Raw Vault—it’s a reflection of how the source system delivers the data. Optimization or deduplication can happen in the Business Vault or information marts.
Case 2: Addresses in a Separate Table
When your source system contains a dedicated address table, you have two main modeling options:
Option A: Treat Address as a Hub
- Create an Address Hub using a business key. If no natural key exists, use a surrogate/technical key.
- Attach a Satellite to store descriptive fields (e.g., street, city, ZIP).
- Use a Link to relate addresses to other Hubs like Contact, Customer, or Lead.
This pattern is especially useful in industries like insurance, where addresses are treated as critical business objects (e.g., accident location).
Option B: Treat Address as Reference Data
- Store addresses in a flat reference table.
- Use an ID (like address_id = 55) as a code in a descriptive Satellite on related Hubs.
This is simpler but comes with limitations—it doesn’t track historical changes. For example, if a street name changes, the system won’t retain that history.
Tracking Changes in Reference Data
To address the historical limitation, consider modeling reference data with:
- A Reference Hub (no hashed keys, just the code, load date, and record source).
- A Reference Satellite to track changes over time using load date as part of the primary key.
This design allows you to maintain a history of changes in descriptive reference data without violating Data Vault principles.
Handling Many-to-Many Relationships Between Addresses and Entities
Things get more interesting when addresses are shared across multiple business objects (e.g., one address used by both a Lead and a Customer). If your source includes a bridge table (many-to-many), follow this approach:
- Create an Address Hub.
- Create individual Hubs for each business object (Lead, Customer, Contact).
- Establish a Link using the bridge table to represent relationships.
- Track relationship history using Effectivity Satellites.
If the bridge table uses a generalized object like “Customer” to reference multiple types (Contact, Account, Lead), you’ll need:
- A Generic Customer Hub.
- Use raw business keys or technical IDs (UUIDs).
- Create Links between the Customer Hub and the Address Hub.
- Use Satellites to track effectiveness (i.e., from when to when an address is associated).
Resolving Ambiguity with Conditional Relationships
Sometimes, the source system generalizes business objects (e.g., Microsoft CRM’s Customer entity could be an Account, Lead, or Contact). In these cases, build a generic Customer Hub first. Then, in the Business Vault, apply conditional logic to determine if a Customer is actually a Lead, Contact, or Account—but only in the Business Vault.
This conditional logic would take the form of queries that check whether a UUID from the generic Customer Hub exists in the Lead Hub. If so, you can establish a Business Vault Link between the generic Customer and the Lead.
Guiding Principles for Modeling Address Data
- Stay Data-Driven: Model what you see, not what you think should be there.
- Don’t Add Conditional Logic to the Raw Vault. It belongs in the Business Vault.
- Use Hubs for real business objects like Address when they are shared across systems or have standalone value.
- Use Reference Data when addresses are just descriptive codes without relationships.
- Track History with Effectivity or Reference Satellites if needed.
Ultimately, the choice between treating address data as a business object or reference data depends on your use case. If you’re dealing with complex, shared addresses with historical importance, model them as Hubs. If not, use reference tables or Satellites. But always be consistent and avoid conditional logic in the Raw Vault.
Conclusion
Modeling address data in a Data Vault architecture isn’t one-size-fits-all. Whether it’s Salesforce, SAP, or Microsoft CRM, the goal is to be faithful to your data, follow the architecture’s guiding principles, and maintain flexibility. By doing so, you ensure scalability, auditability, and long-term maintainability of your data warehouse solution.
Watch the Video
Meet the Speaker

Michael Olschimke
Michael has more than 15 years of experience in Information Technology. During the last eight years he has specialized in Business Intelligence topics such as OLAP, Dimensional Modelling, and Data Mining. Challenge him with your questions!