An initial decision of critical importance within Data Vault development relates to the definition of naming conventions for database objects. As part of the development standardization, these conventions are mandatory as to maintain a well-structured and consistent Data Vault model. It is important to note that proper naming conventions boost usability of the data warehouse, not only for solution developers but also for power users within data exploration.
Throughout this article, we will present the most vital considerations within our standard book, the process of defining naming conventions.
Naming convention documentation
It is one aspect to simply define naming conventions utilized within the development of your data warehouse, but it is completely another to establish consistency as to create defined naming conventions that are to become standards. That said, it is a good practice to document a guideline for naming Data Warehouse objects. To that end, the next sections will discuss several considerations to take account of when defining the naming conventions for a data warehouse solution.
There are several options when it comes to considering letter case for names: all uppercase, all lowercase, Camel Case, Pascal Case. Though the variances may be slight, each option does have its own advantages and disadvantages regarding legibility as well as type-ability, to briefly touch upon the differences.
Ultimately, the decision regarding letter case falls to the decision of which database management system is used as some, such as PostgreSQL, support case-sensitive object name which requires the use of quoted names. Therefore, users often prefer the lower-case by default in PostgreSQL as having lower-cased object names can reduce the amount of code that is to be generated while simultaneously improving the usability for ad-hoc queries by power users. Nevertheless, it is imperative to maintain one consistent letter case for both entity and column names.
Usage of underscores “_”, hyphens “-”
To improve readability, word separators like underscores “_” or, depending on use-cases, hyphens/dashes “-” are desirable. Though it is important to remember that in many systems, hyphens are interpreted as minus signs. That said, hyphens are commonly used in XML or JSON data format though they can be easily replaced by underscores, should the latter be used as separators.
Some systems enforce character limit on object names, e.g. Oracle 12.1 and below only allows for a maximum object name length of 30 bytes. Therefore, abbreviations and acronyms may be taken into consideration during the object naming process, despite the fact that they can often lead to misinterpretation. To combat this, it is suggested to compile a document containing a list of the abbreviations being used with detailed description of their meanings. However, to limit any possible confusion avoid excessive use of abbreviations and acronyms.
In logical models, it is advisable, that object names are as self-explanatory as possible, i.e. most words should be fully spelled out, except common abbreviations for longer words such as “dept” for “department” or “org” for “organisation”. However, abbreviations and acronyms are typically used in physical models, to keep object names short.
Singular vs. plural object names
It is a common practice to utilize nouns or noun phrases, in their singular form, as object name. This is done to avoid the necessity of dealing with irregular pluralization in English, e.g. man/men, person/people, which would unnecessarily add a whole new level of complexity within the data model.
Prefix vs. Suffix
Whether objects are named using prefixes or suffixes, is not of much importance within the development. That said, internally at Scalefree, we prefer table names with suffixes such as “customer_h”, “transaction_l” instead of utilizing prefixes. The advantage of this method is, given most database tools sort tables alphabetically, all tables, which are related to a business object, will be grouped together. For example all contact hubs, satellites, and links that have names beginning with “contact_…” will, therefore, be found together in the browser. This supports data exploration by power users and developers.
Nevertheless, prefixes can be meaningful within some use cases, e.g. using prefix in layer schema names helps keeping them neatly listed together in the database browser.
Naming conventions are partially a matter of personal preference and of organizational guidelines. Regardless, the more systematically consistent the naming conventions become when defined, the more benefits will ultimately be seen within the development and implementation of your Data Vault solution. To illustrate this point, we encourage Data Vault development teams to write and implement a simple SQL function that inspects the entire database while checking for naming convention inconsistencies. This further ensures that standards are being followed.
Interested in how we standardize naming conventions here at Scalefree? In an upcoming article, we will share several concrete suggestions for naming conventions, most of which both our customers and our team regularly utilize internally.
Now let’s open this up for discussion in the comment box below: How do you implement naming conventions within your Data Vault development? What conventions do you follow?
Get Updates and Support
Please send inquiries and feature requests to [email protected].
To support the creation of Visual Data Vault drawings in Microsoft Visio, a stencil is implemented that can be used to draw Data Vault models. The stencil is available at www.visualdatavault.com.