Data functions are the nuts and bolts of the digital world. In development, they allow programmers to exchange directly with a database and modify object in-line. In Excel, they query, create, and modify arrays. In math, they allow statisticians to address datasets using Sigma (Σ) and Pi (∏) notation.
But what exactly is a data function, and why is it present in such a variety of fields?
This article will clearly define data functions and explain them in different use cases. The goal is to help you understand (1) how data functions are used in your field and (2) how important they are to the computational world as a whole.
Definition: Data Function
Defining data functions in a meaningful way is no easy task because they are present in different contexts with varying goals. Examples are critical to understand. However, we can generalize their purpose to establish a definition.
In short, a data function is a query, modification, or computation that directly or indirectly impacts values in a data table. Data functions are different from other functions because they operate exclusively with tabular data, rather than un-arranged values. They often appear in math, computer programming, data tools, and spreadsheet software.
To concretize the definition, imagine the following. If I calculate (5+10)*(6+10) in general, this is not a a data function. However, if I calculate ∏(x+10), where nmin=5 and nmax=6, then the Pi notation specifically addresses an array of two data values for variable x. Since an array is the simplest form of a data table, the formula interacts with tabular data and is a data function.
The meaning is subtle and may seem silly in this example, but it becomes critical when computations grow in complexity. Let’s turn our attention to specific use cases now.
Don’t forget, you can get the free 67 data skills and concepts checklist to cover all the essentials (including data functions).
Data function in Math (Statistics)
Math contains the simplest instance of data functions. As shown in the above example, a given function f(x) is a data function when it addresses a list of data points in a series, rather than loose variables or numbers. The most common formulas used to address this topic are Sigma and Pi notation. Every data function in statistics can be represented by some form of these structures.
Let me be clear: it is for this purpose that the entire field of statistics exists — to analyze data sets. You may not have realized it, but even common statistical measures such as average are formally written as data function.
- Average = 1/n * ∏x, where nmin=lowest number in series and nmax=highest number in series.
Note that median and mode are not actual statistical calculations. While they describe a dataset by identifying the middle value and the most common value, they are not really statistics because they do not create a computational relationship between all the values in the set.
Moreover, Sigma and Pi do not have subtraction and division counterparts. Why? Because any case of subtraction or division can be written in terms of addition and multiplication. For example, x – 10 can be written as x + (10 * -1), and x/10 can be written as x * (1/10).
Written succinctly, data functions in math fall under the purview of statistics and use Sigma and Pi notation.
Data function in Excel
Data functions in Excel appear in two ways. On the one hand, any formulas that involved data structured in a tabular form can be considered a data function. Since the majority of Excel do just that, most can be considered data functions.
On the other hand, the data analysis add-on and the table data function are often what readers look for.
Data Analysis Function in Excel
The data analysis function is actually an add-on that you can add by navigating to Tools > Excel Add-ins > Analysis ToolPak. Check the box, restart Excel, and you should have it. This function will allow you to run common statistical analysis on datasets.
How to Use Table Data Function in Excel
If you’re looking for the table data function, you’re a little outside of our definition for data functions. What Excel calls table data function is nothing more than a “What If” scenario tool that let’s us view the results of a formula by entering multiple criteria. Though not technically a data function, the table data function useful to understand for comparative purposes.
For example, imagine a formula that multiplies the values of two cells. We can create a scenario that shows what the outcome of this formula would be given a list of various inputs. The following image shows how this could look like. Cell C2 shows the product of cells A2 and B2. By linking cell F2 to cell C2, we can set up the table data function to show what the results would be if input 1 were not 10 but 1, 2, 3, 4, and 5.
Data function in Computers
The use case with the highest level of detail is the machine on which you’re reading this sentence: a computer (and if you’re on mobile, yes, a phone is a computer).
The four functions of a computer are input, process, storage, and output. Data storage takes place on a hard drive, where information exists in sequences of 1s and 0s. You may have heard this referred to as binary code.
However, for binary to be readable to humans, it is shown in tabular form as tables. When the computer translates storage to output (the screen), it is using a data function to do so. Why? Because the computer queries tabular data from the hard drive and shows it on the display.
Tibco & Spotfire
You will often hear the term data function in the context of Tibco and Spotfire software, which are enterprise data management tools. I won’t go into detail here since they’re not an open source platform and only benefit a limited number of users (that pay).
However, data functions in Tibco and Spotfire are scripts written by users to enhance the software’s calculating capacity — far from the true definition of a data function.
Data function in R
Data functions in R are more granular than the examples we’ve seen above. The Data() function is one of the most common functions because it’s a basic way to load data from underlying tables into the r workspace.
In short, the Data() function accepts a character argument that corresponds to a pre-defined table. In this way, it’s just a query function. In addition, Data() can accept multiple criteria. Here’s the full formula syntax:
data(…, list = character(), package = NULL, lib.loc = NULL, verbose = getOption("verbose"), envir = .GlobalEnv, overwrite = TRUE)
“…” corresponds to the name of the data table. If there is only one table available, it can be as simple as that. However, additional arguments allow you to specify details:
- List =, allows you to specify a subset of the data table if it has been pre-defined
- Package =, allows you to specify in which r package the data table and list are found
- Lib.loc =, allows you to specify a directory in r where the data table, list, and package are found, or allows you to ignore with NULL, which tells r to search all known directories
- Verbose =, allows you to get additional, automatic diagnostic info about the data table
- Envir =, allows you to specify in which environment the data should be uploaded
- Overwrite =, allows you to overwrite current value in the environment to make way for new ones
As you can see, Data() in r is a data function because it deals with querying data from an underlying table, not just unstructured numbers.
We’re highlighting the Data() function as an example, but it’s far from being the only r function that works with underlying data tables. In reality, we can’t cover them all, but a large portion of functions across many of programming languages can be categorized as data functions.
Data function in C++
While in r the Data() function deals with querying, or downloading data, the Data() function in C++ deals with creating and modifying an array (aka a small dataset).
In short, the Data() function in C++ identifies a character in a string and writes them into an array. To understand how this works requires knowledge of C++, but you can think of it like linking a cell in a table to a standalone cell with text in it in Excel.
The data() portion of this syntax writes a constant character(s) into the target array. The syntax looks like this:
const char* data() const;
It exists as 4 different types:
- .data( key, value ) – a data entry statement that allows the programmer to give a name (key = STRING) to the value, and a data point (value = any data type except undefined)
- .data( obj ) – a data entry statement that allows the programmer to attach a specific object to any document object model (DOM) necessary
- .data( key ) – a data return function that shows the value of the key entered in a previous statement
- .data() – an empty data return function that accepts no arguments and will simply return the values previously associated of the adjacent element
It’s important to note that these data functions start with a period because they are added on to individual elements in-line. For example, imagine you have an element $( “body” ) to which you add .data( “AnalystAnswers.com”, 52 ); creating the resulting $( “body” ).data( “AnalystAnswers.com”, 52 ); You have now named the body element AnalystAnswers.com and given it a numeric value of 52.
All four items are data functions because they deal with creating or modifying tabular data, directly or indirectly.
If you found this article helpful, you can find more free content at the AnalystAnswers.com homepage!