If you have worked a desk job, you know the importance of good file names. They can be as simple starting with “YYYY-MM-DD,” or as complex as a 5-part underscore-deliminated system.
When you have them, you take them for granted. When you don’t have them, you kick yourself when someone asks you to track down an old document. File naming conventions create order.
The same principle applies to data. Whether it’s in the form of a metadata library (aka data dictionary), field names or column headers in a database, rules for data types, file type extensions, or measurement units, data must comply with naming conventions to maintain structure — archivability.
The importance of archivability in data goes beyond a single organization. Because coding languages and data are often shared, reworked, saved, and modified across time and space, the standards must be international.
This is no easy task. That’s why organizations such as Unidata have established world-wide standards that act as a reference today. While these are standards for programming languages, they are important for database conventions as well. For example, here’s a list of guidelines for Network Common Data Format (NetCDF):
Self-Describing. A netCDF file includes information about the data it contains.
Portable. A netCDF file can be accessed by computers with different ways of storing integers, characters, and floating-point numbers.
Scalable. Small subsets of large datasets in various formats may be accessed efficiently through netCDF interfaces, even from remote servers.
Appendable. Data may be appended to a properly structured netCDF file without copying the dataset or redefining its structure.
Sharable. One writer and multiple readers may simultaneously access the same netCDF file.
Archivable. Access to all earlier forms of netCDF data will be supported by current and future versions of the software.
Network Common Data Format
The question is: do you or your organization comply with naming conventions and guarantee the longevity of your data?
The purpose of this article is to define data naming conventions, explain the 5 major categories of data naming conventions and their standards, and show you how to implement them easily.
Definition of Data Naming Conventions
In short, data naming conventions are the rules governing formatting elements such as typographical emphasis, delimination, casing, decimal places, and numerals (Arabic vs. Roman) that a given data type must abide by. As a list, the elements of formatting are:
- Typographical emphasis (bold, italics, underline)
- Delimination (separating words with spaces, underscores, or other)
- Abbreviations (shortening long words)
- Casing (upper or lower)
- Decimal places (zero or more)
- Numerals (Arabic such as “1, 2, 3…” or Roman such as “I, II, III…”)
A common example of is dates. For example, March 1st, 2020 could be represented as:
- March-1,
- 2020-Mar-1,
- 2020-03-1,
- or any variation thereof.
5 Data Types and their Naming Conventions
Strings (a.k.a text)
- Typographical emphasis: avoid emphasis for strings except when they act as a field name.
- Delimination: always separate words using underscores.
- Abbreviations: use official dictionary abbreviations as much as possible. For strings like months, use the first 3 letters.
- Casing: always use lower-case letters.
- Decimal places: does not apply to strings.
- Numerals: does not apply to strings.
Numbers
- Typographical emphasis: avoid emphasis for numbers unless they are a calculation of raw data. Sums, for example, should be bold.
- Delimination: avoid deliminating numbers unless you absolutely need to. In those cases, simply use a comma with no spaces.
- Abbreviations: does not apply to numbers.
- Casing: does not apply to numbers.
- Decimal places: the rule of thumb for decimal places is that presentation numbers should not have any, while source-data should have 2 digits the right of the decimal.
- Numerals: avoid using Roman numerals at all costs. Today, they are a stylistic choice but provide very little added value. Instead, use normal Arabic numbers.
Dates
- Typographical emphasis: dates should not have typographical emphasis, unless they act as a field header.
- Delimination: dates should never be deliminated by spaces. If they act as a field header, you should use an underscore. If they are a data item, you should use a hyphen. For example: Mar-12-2020.
- Abbreviations: only applies to months. If you want to abbreviate a month, use the first three letters.
- Casing: dates follow normal grammatical casing rules. You should capitalize the first letter of a month.
- Decimal places: does not apply to dates.
- Numerals: use Arabic numerals, not Roman numerals.
Time
- Typographical emphasis: times should not have typographical emphasis, unless they act as a field header.
- Delimination: time does not require delimination. However, hours, minutes, and seconds may be separated by a colon. For example, we can write 1 hour, 10 minutes, and 3 seconds as “01:10:03”.
- Abbreviation: does not apply to time.
- Casing: does not apply to times.
- Decimal places: does not apply to times.
- Numerals: use Arabic numerals, not Roman numerals.
Currency
- Typographical emphasis: currencies should not have typographical emphasis. However, the currency symbol, whether it’s “$,” “€,” or another, should come before the number and have no space separating it (accounting types are different).
- Delimination: each thousands place should be deliminated by a comma. For example, 6000000 should be written as 6,000,000. That said, some countries use decimal points for thousands places and commas for zero places.
- Abbreviation: does not apply to currency.
- Casing: does not apply to currencies.
- Decimal places: currencies should not have any digits the right of the decimal in presentations. However, they may have more in source data.
- Numerals: currencies should use Arabic numerals, not others.
Accounting Numbers
- Typographical emphasis: accounting numbers should not have emphasis. However, the currency symbol such as “$,” “€,” or other should be automatically-formatted to the left-most space in the cell.
- Delimination: accounting numbers should be deliminated by a comma at each thousands place.
- Abbreviation: does not apply to accounting numbers.
- Casing: does not apply to accounting numbers.
- Decimal places: accounting numbers should have two decimal points.
- Numerals: accounting numbers should use Arabic numerals, not Roman ones.
Example
Imagine you run a wholesale watch company called Batch Watch. Your last company failed because you couldn’t find a file for a client, so you’re intent on setting up consistent, logical data naming conventions to avoid any further problems.
Your first challenge is tackling weight measurement units and field headers (attributes) for a database concerning your products. While you may have your own vision about which weight units to use and how to format headers, you are, after all, working in a watch company that you intent to grow internationally.
So you decide to consult with a specialist. You will have a product ID, weight, price, date sold, time sold, and an accounting entry in the database. Using the data naming conventions identified above, your field headers are the following:
product_id | weight | price | date_sold | time_sold | accounting_entry |
---|---|---|---|---|---|
Then you show how this would look with some data entries.
product_id | weight_in_kg | price_in_USD | date_sold_YY-MM-DD | time_sold_24_hour_clock | accounting_entry_debit:credit |
---|---|---|---|---|---|
sunny_passion | 1.0 | 10 | 20-05-01 | 04:10:36 | 10:10 |
winter_watch | 0.8 | 24 | 20-08-29 | 10:46:09 | 24:24 |
As you can see, what you originally envisioned for the data entries changes once you start entering data. This is an important point in naming conventions: they must be adapted to each particular situation.
While there are general naming conventions governing the 5 principle types of data, each organization must modify them to meet their personal needs.
For this reason, data naming conventions will never be fault-proof and universal, but anyone who understands the fundamentals will be able to easily switch between organizations.
Conclusion
Data naming conventions govern the formatting for data fields and data items. These conventions are useful as a guide for organizations, but will never encompass all possible entries in the world of business and science. With that said, you should be able to identify the 5 most important types of data:
- Strings,
- Numbers,
- Dates,
- Times,
- Currency, and
- Accounting.
And you should know if they are as close to the following conventions types as possible:
- Typographical emphasis (bold, italics, underline)
- Delimination (separating words with spaces, underscores, or other)
- Abbreviations (shortening long words)
- Casing (upper or lower)
- Decimal places (zero or more)
- Numerals (Arabic such as “1, 2, 3…” or Roman such as “I, II, III…”)