As a concept, data classes are nearly as old as IT itself. They represent a group of data attributes, data items, and/or logical operators that the user wants to treat as one group, or class. The tricky part is that data classes carry slightly different definitions depending on the programming language or data analysis software. For this reason, it’s best to approach data classes as a conceptual tool, then dig into different applications.
In short, data classes are user-defined filters designed to isolate portions of an underlying dataset based on data attributes. In some cases, users base the filter on logical operations, though this is reserved for advanced programmers. Data classes are common in data software such as Excel & SQL, as well as in programming languages such as Kotlin and Python.
Data Class Example in Excel: Allocating with Filters
Imagine you have a dataset with data concerning 3 types of watches you sell. Each of those watches has 5 attributes: weight, price, color, material, and origin. As a data analyst, you’re using Microsoft Excel as a simple data wrangling tool.
As a general rule, you want to split the data into two classes: watches made from titanium, and watches that cost more than $100. To do so, you will create two new tables, each with a vlookup that selects entries of the relevant attribute that meet these respective criteria. In this way, you will have two tables, each representing one of the two classes. Entries not meeting one of these class criteria fall into a final, 3rd class.
It’s important to see here that our classes are nothing more than filters applied to attributes. Data classes are, in most cases, just subsets of datasets extracted based on attribute values. We can say, then, that the data subset attribute and values have been allocated.
This is a simple example. Because Excel is a granular data wrangling tool, it’s easy to understand how classes work. However, other software and programming languages such as SQL, Python, and Kotlin, are high-level. Creating a data class is sometimes as quick as writing two lines of code. That’s why it’s important to really understand what the filter-based data class concept implies.
Creating Raw Data with Data Classes
In our above example, we’re filtering from a dataset to retrieve a subset. However, in many programming languages, we “create” a dataset directly in the code instead of pulling from an underlying database. We also refer to these inline, created datasets as data classes. Data classes in these cases are usually small and supplementary, not important source data.
For example, you may want to fill in some missing data points without consulting the data source. Or perhaps you want to manually enter a small amount of data rather than writing the filtering rules and code. Whatever the reason, often when you create raw data directly in a line of code instead of modifying the source, you are using data classes. This is of course subject to the specific terminology of your programming language, so be sure to consult its official dictionary before writing data classes formally.
Getting & Setting Data Classes
In some contexts, data classes contain no data at all. Instead, they function as tools used on a real data base to get and set data. While this sounds similar to the filter-based data class concept, it’s important to note that it’s the means of getting and setting, not the output table, that we call the data class in this case.
“Getting & setting” data classes are data objects in some programming languages that preselect attributes to draw down from an underlying database. In other words, they are the “rule” or “logic” that the programmer executes in order to pull, on a case-by-case basis, the data s/he wants to treat. In most cases, these data classes are a sublist of attributes from a complete list in the underlying database.
This process may not seem different from filtering attributes within a database, but it becomes very useful when you have many underlying databases. When there are two or more databases available, it’s useful to source the data using a “getting & setting” data class. Not only is it easier to organize your databases, but it’s much faster than filtering through multiple data tables. And since the process is “automated,” meaning you don’t need to filter manually, it also puts less pressure on your server.
Data Class vs Class
Especially in Python, a common misunderstanding concerns normal classes versus data classes. For most intents and purposes, a class is a collection on data attributes, their values, and logical operations. Programmers and analysts have the choice of creating different kinds of classes that one or more of these three. Classes that contain only attributes are referred to as data classes.
Data Class vs Data Object
In simplest terms, “data object” is an umbrella term under which data classes fall. Data objects include data tables, arrays, pointers, records, files, sets, and scalar types; and data classes are in some ways a table and a record, depending on how you use them.
With that said, data objects have a fixed, universal meaning, whereas this article has shown that data classes are more nuanced and variable depending on the software and programming language.
Summary of key points concerning data classes
- Data classes as a concept are nothing more than the narrowing-down of a database by filtering its attributes.
- The action to describe this process is allocating.
- In Excel, this would look like filtering a column based on an attribute criteria, or using a vlookup function to extract only the entries that meet your data class criteria.
- In the case of programming languages, developers often like to add data directly in their code rather than by modifying data at the source. When they do so, they create data classes. However, each language has different terminology for inline data additions, so you should check the official language dictionary before writing data classes formally.
- A special set of data classes are called “getters & setters” because the data class is a rule or set of logic that pull data from a database according to set criteria. They stand in opposition to basic data classes, which are the result of pulling data. In other words, getters and setters are the means whereas normal data classes are the output.
- A further distinction exists between normal classes and data classes, and it’s most common in Python. Normal classes contain data attributes, data values, and rules or logic, whereas data classes only contain data attributes.
- “Data object” is an umbrella term under which data class falls, although data classes are not as standardized as data objects across technologies.
Data Classes in Kotlin
Kotlin is a relatively new programming language. It’s largest use is in the world of Android programming; other than Java, Kotlin is the most widely used language to this end. And data classes are an important part of the language.
Data classes in Kotlin mainly serve to hold data; that is, they’re not an actionable data object, but a storage object. I won’t spend much time explaining syntax, since KotlinLang.org is the ideal source to explore how to code the language. However, it’s useful to explore the requirements for data classes in Kotlin, as it will help identify how classes here fit into our universal definition and understanding discussed above.
Kotlin’s website lists the following criteria for data classes to “ensure consistency and meaningful behavior”:
- The primary constructor needs to have at least one parameter;
- All primary constructor parameters need to be marked as
val
orvar
; - Data classes cannot be abstract, open, sealed or inner;
- (before 1.1) Data classes may only implement interfaces.
The primary constructor is the basis for the data class, and it is required to have a parameter that provides an insight concerning the whole population of the underlying database. This means the parameter is quite general.
There is not much more general in data than “val” and “var,” which represent value and variable. You must use a value and/or a variable to build your data class. This helps us intuitively link the Kotlin data class back to data analysis in general.
Data classes cannot be abstract, meaning you cannot create objects from it. They cannot be open, meaning they cannot inherit from other objects. They cannot be sealed, meaning their values cannot be limited to a small set. They cannot be inner, meaning their values cannot reference an outer class. In other words, data classes in Kotlin need to be standalone!
This standalone nature is in line with our understanding of data classes from above.
Conclusion
Data classes are common to many programming languages and data analysis software, but they do not have a one-size-fits-all function or definition. Instead, they generally represent subsets of larger databases, the latter of which are filtered on attributes — a process referred to as allocating.
Some data classes serve the specific function of manually making additions to the data source by inline code. Still other data classes are not data at all, but the means to retrieve data from another data object. Data classes differ from normal classes in that they focus on attributes and values rather than rules and logic.
While in some ways they are data objects, data classes are not as standardized as the former. Perhaps most importantly, data classes have become increasingly important in the programming language Kotlin, a huge player in the Android development sphere.