Data fields are fundamental building blocks of data. They are so essential that analysts in various disciplines have appropriated them to meet the needs of their use cases. Though these applications are a testament to the value of a data field, they create confusion as to the precise meaning behind them.
The purpose of this article is to (1) clearly define data fields, (2) list different types, (3) show how they are applied in databases and programming languages, and (4) clear up confusion around “data field” and “field data.”
If you work with data in any way — from data entry to data science — this article is for you.
Don’t forget, you can get the free 67 data skills and concepts checklist to cover all the essentials (including data fields).
Data Field Definition
A data field is a location for a predetermined type of data that — collectively with other data fields — describes the place it is stored. The most common example is a column in a data table that describes the row with which it intersects, and others include class variables in programming languages such as Java.
A simple way to think about data fields is to imagine an object and describe multiple instances of it. For example, let’s take beds. One bed is king sized, made of wood, and uses spring suspension. Another bed in queen sized, made of metal, and uses elastic suspension.
Beds are the object, while size, material, and suspension type describe the beds. When we write these down on paper or enter them in a database, we’ve created a location for the three descriptions, and they are now fields.
Data fields are also known as data attributes, data traits, and data characteristics.
Data Field in a Database
The fundamental form of data is data tables. They consist of rows that we call records (or “tuples” if you’re fancy) and columns. The first column consists of unique IDs of the objects we are going to describe, and the columns contain information about it.
NOTE: there is some dispute as to whether the first column should be considered a field or simply the “unique ID.” Since by definition fields describe the location they are stored, we could consider the unique ID either a titular description of itself, but it’s more intuitive to assume it’s simply the place described by the fields.
Using our example of beds, here’s a sample table:
|Bed (Unique ID)||Size (Field)||Material (Field)||Suspension (Field)|
Now it should be clear what a data field is, so let’s revisit our definition: a location for a predetermined type of data that — collectively with other data fields — describes the place it is stored.
In this example, the location is a column, the type is size/material/suspension, the collective is those three together, and the place described is the row’s bed.
It’s important to note that a field is NOT the individual data entries (“King,” “Metal,” “Spring”). It is the column as a whole, in this case the size field, material field, and suspension field.
In the context of a database, columns and fields are synonyms.
Examples of Data Fields
Simple examples of data fields include weight, color, date, sales, averages, member ID, promotional code, status, file type, keyword, and rainfall. Any piece of information that can be used to describe an object, event, or idea can be a data field.
That may sound silly, but it’s actually what makes data fields so flexible and so valuable — almost anything in the natural and human worlds can be a data field.
Data Field in Excel
In Excel, a data field is a column in any data table structure built with cells. It is easiest to identify when we’re using the Excel’s Data Table function (Insert > Tables > Table), but any column in a row-by-column table can be a data field — even a simple range.
Data Fields and Data Records
The two dimensions of a data table consist, as we know, of rows and columns. Rows are referred to as records because they consist of one instance of the unique ID and each field.
You’ve probably heard the word “record” used all your life without realizing that this is what it refers to.
A record is nothing more than a complete row of data. Police records, sports records, school records, and geographic records are all examples are rows in a database. A school record might look like this:
|Student Name||GPA||Attendance (out of 180)||Average tardiness||# of extra curriculars|
|Jane Doe||3.5||175||5 min||3|
The first (top) row is known as the header because it shows the “head” of the body of information in all of the following rows. This is important because it moves our focus from the whole table towards rows. A record is one instance of what the table tells us — the base of the table.
If we only saw one record, we could infer the rest of the data table. However, if we only saw one field, we would not be able to infer the rest.
Collection of Data Fields
You may have heard the phrase “a collection of data fields.” A row is a collection of data fields in a data table. The cell where a field intersects with a record is the “place” it describes — the unique ID of that record. In this sense, a record is a “group” of fields.
Types of Database Fields
Database fields are not all made the same. In general, they fall into six types: (1) dates (2) numbers, (3) strings, (4) Unicode, (5) Binary, and (6) miscellaneous. In most cases, this is what you will encounter as a data analyst since the differnet subtypes tend to be complex and require knowledge of computer systems. However, for the sake of completeness I list them all below. Here’s a list of different data types with short descriptions:
- Numeric Data Item Types
- Integer – any number that is not a decimal. Examples include -11, 34, 0, 100.
- Tinyint – an integer, but only numbers from 0 to 255
- Bigint – an integer bigger than 1 trillion
- Float – the number of significant digits after the decimal is relative to the total number of digits, contrary to fixed-point numbers where the number of digits is predetermined by the user. Fixed-point number could be 1.28 and 1.94, whereas the equivalent floating numbers could be 1.28 and 1.9457683, since the second has more digits in it’s raw form.
- Real – any fixed point on a number line
- Date and Time Data Item Types
- Date – the date sorted in different forms, including “mm/dd/yyyy” (US), “dd/mm/yyyy” (Europe), “mmmm dd, yyyy”, and “mm-dd-yy” among many more.
- Time – the time of day, broken down as far as milliseconds
- Date time – the date and time value of an event
- Timestamp – stores number of seconds passed since 1970-01-01 00:00:00’ UTC
- Year – stores years ranging from 1901 to 2155 in two-digit or four-digit ranges
- Character and String Data Item Types
- Char – fixed length of characters, with a maximum of 8,000
- Varchar – max of 8,000 characters like char, but each entry can differ in length (variable)
- Text – similar to varchar, but the maximum is 2GB instead of a specific length
- Unicode Character and String Item Types – unicode is a way of structuring data in the form of U+0000, where the 0’s can be any number. For example, “U+0021” translates to the exclamation point “!”.
- nchar – fixed length with maximum length of 8,000 characters
- nvarchar – variable length with maximum of 8,000 characters
- ntext – variable length storage, only now the maximum is 1GB rather than a specific length
- Binary Data Item Types – a combination of 0s and 1s
- binary – fixed length with maximum of 8,000 bytes
- varbinary – variable length storage with maximum bytes, topped at 8,000
- Miscellaneous Data Item Types
- clob – also known as Character Large Object, is a type of sub-character that carries Unicode texts up to 2GB
- blob – carries big binary objects
- xml – a specific data type that stores XML data. XML stands for extensible markups language, and is common in data bases
All Values in a Field are the Same Type
All values in a given field will be the same type. This is a fundamental principle of data quality. If your fields contain data of different types, then you cannot filter on attributes or manipulate measures.
Numeric Fields Are Measures, Not Attributes
Numeric field types are special because they behave differently from the other types. Text and other data types are considered attributes because they cannot be manipulated across or down the data table. We can only count the number of times they appear or filter the data on them.
Numbers, however, can be added, subtracted, multiplied, and divided across and down the table. This means they are dynamic, and we refer to them as measures rather than attributes.
For example, imagine we add another field to our dataset on beds for weight as shown below. We could add the weights of Bed 1 and Bed 2 to arrive at a new value, 185, because it is a measure. However, we cannot manipulate words to arrive at new values. We can only consolidate rows based on them, which makes them attributes.
|Bed (Unique ID)||Size (Field)||Material (Field)||Suspension (Field)||Weight in KG (Measure)|
Data Field Mapping
Sometimes we want to consolidate multiple data tables into one, and these tables concern the same unique IDs but are not labelled or normalized in the same way. Data field mapping is an activity that outlines how the fields from two or more starting tables will be consolidated into a new table.
For example, consider the following two tables from a book store database. In the first, we see that BookID is repeated multiple times to accommodate the months when it was checked out. This means BookID + Month is the unique ID for this table, since we could otherwise simply add the number of checkouts per book.
In the second, we can see that BookID is sometimes repeated for multiple editions, such as AY135 in row 4. This means that the ISBN number is the unique ID for column 2.
In order to combine these two tables, we need to map which fields can and should be included. To figure out how this could work, we need to identify the coarser of the two unique IDs — that is, which one provides the most detail and can therefore be combined with all of the other database’s fields.
In this example, the coarser of the two unique IDs is BookID + Month. We can thus map all of the fields from the second table into the first table. The result would look like the following:
You’ll see that the fields from the second database are duplicated for each UID Checkouts ID. This is a necessary to ensure each row has the correct data from each field, although it is redundant and takes up space.
However, note that this table is based on a choice to include only the books that were checked out. It is possible that some of the ISBNs from table two were never checked out. If they weren’t, then we do not see their information in the combo table (since the checkout unique IDs do not include book records if they weren’t sold).
In this way, field mapping involved choosing the information you wish to see. Had we wanted to see the checkouts for all ISBNs, even those without any checkouts, we would need to first map the UID Checkouts to ISBN, then create a new unique ID for that table. This new unique ID would be used to map all fields from both tables into a new one.
Data Field Definition Document: Data Dictionaries
In this article, we’ve only looked at data fields with intuitive names. In many cases, especially in scientific fields, codes or special jargon may be used as the title for a field. The problem with this is that, well, nobody else can understand it!
Data dictionaries are the solution. A data dictionary describes a table’s columns based on common traits (i.e name, definition, data type) within another table.
Imagine, for example, you have the following table about customers:
The following data dictionary describes certain details about that field, including its definition, the data type, possible values, and whether it is required. “CA” means nothing to the reader, but when we see that is represents “Customer age,” we understand.
|Field name||Definition||Data type||Possible values||Required?|
|CA||Customer age||Integer||15, 18, 20, 23, NULL||No|
Data Fields Programming (and Mockup) Languages
Data fields in programming languages are different from those in data tables because they often exist as a single entry of information that describes an object.
Let’s go back to the definition of data field: a location for a predetermined type of data that — collectively with other data fields — describes the place it is stored. A data field in programming is most often a function or characteristic of an object. I’m not a programmer, but it’s easy to see how fields come into play in these languages. Let’s look at this in HTML and Java.
HTML is not a programming language per se. It’s a mockup language. Still, we use fields to add specific traits to objects in the code. For example, the line “
<input type="text"> defines a single-line text input field1.
In Java, you can specify traits about an object class. One example would be “height = frameDim.height”.
While we often consider code to be data itself, it is merely an alternative form of data. Behind every line of code is a database that stores unique IDs and fields in a standard table format. This is part of the reason why there is confusion around the precise meaning of data fields.
Not to be Confused with Field Data (comes from being “in the field”)
A common misconception, albeit a silly one, is mistaking “data field” for “field data.” Field data refers to information that is collected outside of a lab of office. For example, if I have to go survey people to collect data, we can say that data is collected “in the field.”
Obviously, the field data will include rows and columns, so it includes data fields. But the two terms refer to different ideas.
If you like this article, check out more free content at the AnalystAnswers.com home page!