What to Consider for Naming Conventions in Data Warehousing

Naming Conventions in Data Warehousing

An initial decision of critical importance within Data Vault development relates to the definition of naming conventions for database objects. As part of the development standardization, these conventions are mandatory to maintain a well-structured and consistent Data Vault model. It is important to note that proper naming conventions boost the usability of the data warehouse, not only for solution developers but also for power users within data exploration.

Throughout this article, we will present the most vital considerations within our standard book, the process of defining naming conventions.

In this article:

Naming Convention Documentation
Naming Conventions: Letter case
Naming Conventions: Usage of underscores “_”, hyphens “-”
Naming Conventions: Abbreviation, acronyms
Naming Conventions: Singular vs. plural object names
Naming Conventions: Prefix vs. Suffix
Conclusion

Naming Convention Documentation

It is one aspect to simply define naming conventions utilized within the development of your data warehouse, but it is completely another to establish consistency to create defined naming conventions that are to become standards. That said, it is a good practice to document a guideline for naming Data Warehouse objects. To that end, the next sections will discuss several considerations to take into account when defining the naming conventions for a data warehouse solution.

Naming Conventions: Letter case

There are several options when it comes to considering letter cases for names: all uppercase, all lowercase, Camel Case, and Pascal Case. Though the variances may be slight, each option does have its own advantages and disadvantages regarding legibility as well as type-ability, to briefly touch upon the differences.

Ultimately, the decision regarding letter case falls to the decision of which database management system is used as some, such as PostgreSQL, support case-sensitive object name that requires the use of quoted names. Therefore, users often prefer the lower-case by default in PostgreSQL as having lower-cased object names can reduce the amount of code that is to be generated while simultaneously improving the usability for ad-hoc queries by power users. Nevertheless, it is imperative to maintain one consistent letter case for both entity and column names.

Figure 1: Naming Convention - Letter Case

Build Better Data Platforms

Practical architecture insights for modern data teams. Join 8,000+ data professionals.

Get Free Insights

Naming Conventions: Usage of underscores “_”, hyphens “-”

To improve readability, word separators like underscores “_” or, depending on use cases, hyphens/dashes “-” are desirable. However, it is important to remember that in many systems, hyphens are interpreted as minus signs. That said, hyphens are commonly used in XML or JSON data format though they can be easily replaced by underscores, should the latter be used as separators.

Naming Conventions: Abbreviation, acronyms

Some systems enforce character limits on object names, e.g. Oracle 12.1 and below only allows for a maximum object name length of 30 bytes. Therefore, abbreviations and acronyms may be taken into consideration during the object naming process, even though they can often lead to misinterpretation. To combat this, it is suggested to compile a document containing a list of the abbreviations being used with a detailed description of their meanings. However, to limit any possible confusion avoid excessive use of abbreviations and acronyms.

In logical models, it is advisable, that object names are as self-explanatory as possible, i.e. most words should be fully spelled out, except common abbreviations for longer words such as “dept” for “department” or “org” for “organization”. However, abbreviations and acronyms are typically used in physical models, to keep object names short.

Naming Conventions: Singular vs. plural object names

It is a common practice to utilize nouns or noun phrases, in their singular form, as object names. This is done to avoid the necessity of dealing with irregular pluralization in English, e.g. man/men, person/people, which would unnecessarily add a whole new level of complexity within the data model.

Build Better Data Platforms

Practical architecture insights for modern data teams. Join 8,000+ data professionals.

Get Free Insights

Naming Conventions: Prefix vs. Suffix

Whether objects are named using prefixes or suffixes, is not of much importance within the development. That said, internally at Scalefree, we prefer table names with suffixes such as “customer_h”, and “transaction_l” instead of utilizing prefixes. The advantage of this method is that, given most database tools sort tables alphabetically, all tables, that are related to a business object, will be grouped. For example, all contact hubs, satellites, and links that have names beginning with “contact_…” will, therefore, be found together in the browser. This supports data exploration by power users and developers.

Figure 2: Naming Convention - Prefix

Nevertheless, prefixes can be meaningful within some use cases, e.g. using prefix in layer schema names helps keeping them neatly listed together in the database browser.

Figure 3: Naming Convention - Schemas

Conclusion

Naming conventions are partially a matter of personal preference and organizational guidelines. Regardless, the more systematically consistent the naming conventions become when defined, the more benefits will ultimately be seen within the development and implementation of your Data Vault solution. To illustrate this point, we encourage Data Vault development teams to write and implement a simple SQL function that inspects the entire database while checking for naming convention inconsistencies. This further ensures that standards are being followed.

Interested in how we standardize naming conventions here at Scalefree? In an upcoming article, we will share several concrete suggestions for naming conventions, most of which both our customers and our team regularly utilize internally.

Now let’s open this up for discussion in the comment box below: How do you implement naming conventions within your Data Vault development? What conventions do you follow?

Join the discussion 5 Comments

Karina Nabutovskij says:
May 15, 2019 at 10:22 am
Thank you for your comment Arne!

Kind regards,
Your Scalefree Team

Arne Bleijs says:
May 15, 2019 at 12:30 pm
At ChipSoft we use an ORM to generate the tables for our data warehouse product. Definition is done in C#, so casing follows C# conventions (=PascalCase).

We use database schema’s as datastores (being security boundaries). We have not enough of them yet to prescribe a naming standard.

For the table names in Data Vault we also prefer postfixing, for the same grouping argument you also use.
The format of a Hub-table name would be {module}{entity}Hub, e.g. ConfigUserHub, e.g. ConfigUserHub.
The corresponding Satellite-tables would have names like {module}{entity}{optional subdivision}Sat, e.g. ConfigUserSat or ConfigUserPersonalDataSat.
The Link-tables would have names like ConfigUserLink and ConfigUserOrganisationalLink.

In this way, tables will first be grouped per module, and next per entity, which we think is convenient.

For Information marts (dimensional modelled) we use prefixing, because Dimension-tables are not necessarily bound to a single Fact-table.
The format of Dimension-table names is: Dim{optional subject area}[entity}.
The format of Fact-table names is: Fact{subject area}{measure group}.

For Fact-tables (as well as Data Vault Bridge-tables) we use often plural forms. All other table names use singular forms.

- Karina Nabutovskij says:
  May 15, 2019 at 3:01 pm
  Thank you for your comment Arne!
  
Daniel Fritzler says:
November 3, 2021 at 11:56 am
Assume you have a source system in english, but your users speak german. Additionaly the column names from the source system are highly abbreviated, while your users require speaking column names in their reports. What is the better strategy to name the columns in the raw vault? On one hand you could follow the data driven approach and therefore use the table and column names from the source system. The advantage here is to deliver results quickly. On the other hand you could follow the business centered approach and have german and speaking column names in the raw vault. The advantage would be that you find requested data faster and don’t have to rename columns on the way to reporting. What is the better approach from your experience?

- Trung Ta says:
  March 10, 2022 at 9:11 am
  Hello Daniel,
  
  Thank you for reaching out with your question! Generally it is a good idea to position translation work further downstream (i.e. Business Vault, preferably even Information Delivery), in order to achieve the advantage that you mentioned – to deliver results rapidly. Moreover, if an attribute is renamed in the source, you won’t have to propagate the column rename through the Raw/Business Vault.
  
  However, if for example you have to process a source system that is in a completely foreign language to your organisation, you might want to have translation work done earlier in the pipeline, either in the Raw Vault, or in the Staging area. This ensures that you get needed data into your data warehouse in a comprehensible state for all users – both developers and data consumers.
  
  Thank you kindly,
  Trung Ta

What to Consider for Naming Conventions in Data Warehousing – Part 1

Naming Conventions in Data Warehousing

Naming Convention Documentation

Naming Conventions: Letter case

Build Better Data Platforms

Naming Conventions: Usage of underscores “_”, hyphens “-”

Naming Conventions: Abbreviation, acronyms

Naming Conventions: Singular vs. plural object names

Build Better Data Platforms

Naming Conventions: Prefix vs. Suffix

Conclusion

Join the discussion 5 Comments

Leave a Reply Cancel Reply

Build Better Data Platforms

SOLUTIONS

TRAINING

EVENTS

KNOWLEDGE HUB

CAREERS

COMPANY

Make Better Salesforce Decisions

Build Better Data Platforms

What to Consider for Naming Conventions in Data Warehousing – Part 1

Naming Conventions in Data Warehousing

Naming Convention Documentation

Naming Conventions: Letter case

Build Better Data Platforms

Naming Conventions: Usage of underscores “_”, hyphens “-”

Naming Conventions: Abbreviation, acronyms

Naming Conventions: Singular vs. plural object names

Build Better Data Platforms

Naming Conventions: Prefix vs. Suffix

Conclusion

You May Also Like

AI in Data Warehousing: Fundamental Principles and Applications

Data Vault in a Data Mesh Approach

Choosing the Right Tech Stack for an Open-Source Powered EDW

Join the discussion 5 Comments

Leave a Reply Cancel Reply

Build Better Data Platforms

SOLUTIONS

TRAINING

EVENTS

KNOWLEDGE HUB

CAREERS

COMPANY