are data visualizations good or bad? good

Pros & Cons of Data Visualization: the Good, Bad, & Ugly

To understand the pros and cons of data visualization, you should first look at historical context: the data visualization discipline has grown in popularity as a communication tool since the mid-2000s, a time when private companies began collecting information at mass scale, and for good reason.

Data visualization helps communicate complex relationships and ideas in a simple, digestible format. At the same time, data visualization has a tendency to oversimplify, which can wreak havoc for those using it to make decisions. Its added value (communicating complex ideas) is at the same time its limitation (incomplete communication).

This core challenge is the nexus of the pros and cons of data visualization. Let’s look at two things in this article: the pros and cons of data visualization, and the pros and cons of 4 popular data visualization tools.

What is data visualization?

Data visualization (a.k.a. data viz) is the transformation of raw data tables into numeric illustrations that tell a story. Selecting what information to share, as well as how to share it, are the two fundamental choices in the creation of a viz.

Data visualizations can take many forms. More often than not, visualizations are graphs, charts, plots, and other forms of numerical explanations. But make no mistake: data visualization does not end there. Maps, pictures, and bubble charts are also types of data visualization. Any time you see a map with countries highlighted for emphasis, you’re looking at a data visualization.

In addition, the use of interactive tools is considered the highest form of data visualization. In most cases, this simply means the use of filters within standard visualizations. For example, imagine you have a bar chart that shows the fertility rates of the three wealthiest countries in North America. An interactive data visualization might include a drop-down menu so the user can change to another continent. If s/he selects Europe, we would see the fertility rates in Germany, France, and Italy.

We can’t forget that the story component is key. Data visualization without a message behind is not data visualization at all. It’s just data. This story component is precisely the topic of this article, as it can be either a blessing or a curse depending on the reader.

Don’t forget, you can get practical skills on building data visualizations and other core skills with our free Intro to Data Analysis eBook.

Pros and cons of data visualization: advantages and examples

The advantages of data visualization are many, and the common theme is that data viz makes numbers accessible to everyone. For example, a CEO with minimal finance experience may not understand financial statements (learn more about finance here), but he will easily understand a negative bar chart! It’s easier to display numbers with images.

Another example is the use of a sun next to the temperature to indicate warm weather, or a dark cloud to indicate rain. You might see a line chart to show GDP or population over time, or maybe a pie chart to show the number of men vs women who get malaria in subsaharan Africa.

Moreover, imagine you want to influence you constituents that your policies have helped improve the community. You want to show that since you have been in office, crime rates, death rates, and violence have all decreased. In this case, the best way to do so would be to show a line graph with a decrease over time for these variables.

As a list, here are some pros of data visualization that we’ll explore below:

  1. Easy to communicate
  2. Earn attention
  3. Adds credibility
  4. Easy to remember
  5. Enhancing the message

Easy to communicate

In all of these examples, there is an obvious communicative value that comes from using data visualizations. It’s simply easier to understand rates and relationships when we express them visually. At the same time, we can use colors and simple labels to make the image even easier to understand. Take, for example, this chart of the three N. American countries:

n. america populations no legend graph
N. American Populations, No Legend

It carries some value. We see that populations in two countries steadily increase more than another one. But we don’t know which one, and our intuition or previous understanding probably pushes us to believe that the highest line is the United States, the second highest Canada, and the third highest Mexico. In other words, we can’t be certain. Check out the same graph but with a legend and colored lines:

n. america populations with legend graph

You see the immediate added value that colors and labels bring to the graph. Not only do they make it more aesthetically appealing, but they also show that our intuition was wrong: Mexico’s population is growing much more steadily than Canada’s.

Earn attention

In addition, you earn the attention of all levels of study. There’s a good portion of the population that doesn’t feel comfortable with numbers and numerical analysis. In fact, if you try to communicate relationships using any metric more complex than “rate” or “percent,” you could lose a huge portion of your readers.

It’s not because they’re unintelligent. They just aren’t interested in that kind of information — you simply don’t earn their attention by doing so. But if you’re able to display the relationship you want in a simple data visualization, then you earn their thoughts and concentration, even for just a few moment.

This brings up an important point. Data visualization’s key responsibilities and challenges include the obligation to earn his/her audience’s attention, not take it for granted.

Adds credibility

Data visualization adds credibility to any message. In the field of communication sciences, famous thinkers such as Marshal McLuhan and Kenneth Burke have long spearheaded the idea that a medium of communication changes the way the reader interprets it.

Text, television, and podcasts are three different media because they focus on three different senses. In a nutshell, media can either be hot (requiring very little cognitive participation) or cold (requiring much cognitive participation). Text is cold hot, for example, whereas a podcast is cold. The first doesn’t require the reader “fill in the gaps,” whereas a podcast does!

Data visualization leverages this dynamic. Data vizes are incredibly cold mediums because they require a lot of interpretation and participation from the audience. While dull numbers are authoritative, data visualizations are inclusive.

They absorb the viewer in the chart and communicate the author’s credibility through active participation. Like a good teacher, they walk the reader through the thought process and convince him/her gently.

Easy to remember

A list of the advantages of data visualization would never be complete without mentioning memory. Perhaps the most important advantage of data visualizations is how easy they are to remember.

When’s the last time you remembered the exact words from a text? When’s the last time you could recall an exact figure about an important topic (death rates, for example)? For me, the answer is “I don’t remember!” However, I can easily still remember the chart we looked at above concerning N. American Countries’ populations.

For one reason or another, data visualizations are much easier to remember than text alone. But the tricky part it that you have to pick the right data visualization. If not, it could mean you confuse your audience, in which case they won’t remember anything. Let’s look at that more closely.

Helps enhance a message (when it’s the right kind)

The right data visualization works wonders to enhance the message you want to get across. More than just an easy-to-remember graph that communicates simply, the right graph communicates the data in the way that you want. Two different people looking at raw data see two different stories, and they build graphs to communicate their message.

For example, let’s take a look at the same data presented from two different perspectives:

This is from the white house under president Obama, and the message is clear: Obama’s America earns more high school diplomas. This graph is not wrong per se, but it is a bit misleading. Here’s another view:

Adjusted y-axis diploma rates chart

Clearly, when we adjust the y-axis to include percents from 0% to 100%, we see that the increase over time is both marginal and inconclusive. We don’t know what rates were before 2008, and we don’t see what happened in the last year of his presidency (2015).

The important thing to remember from this example is that both charts are correct — they do not lie. It’s simply how they are represented that changes how we interpret them. This is what I mean when I say that an advantage of data visualization is enhancing a message — the message you want to communicate.

At the same time that this ability is an advantage, it’s also a disadvantage for information communication as a whole. It’s the first disadvantage (con) of data visualization. Let’s look into it.

Pros and cons of data visualization: disadvantages and examples

Any discipline has its shortcomings, and data visualization is no exception. The biggest disadvantage to data visualization is that it can be misleading. In his very famous book, How to Lie with Statistics, by Darrel Huff, does a good job of outlining the many ways in which data visualization, or statistics in general, can be used to mislead.

This core disadvantage creates many problems, or cons, for data visualization. Let’s look into them below, using Darrel Huff’s book as a reference point.

Problems with data visualization

As a list, the problems with data visiazation, or as the title says, cons of data visualization, are:

  1. Correlation is not causation
  2. Similarities now don’t mean similarities forever
  3. Y- and X-Axes can reverse meaning
  4. Abusing the Law of Large Numbers (LLN)
  5. Seasonality kills
  6. “Mean,” or average, is not the best go-to statistic
  7. Standardizing the benchmark is critical

Correlation is not causation

Correlations make us see false relationships. Correlations are the tendency for two numbers to follow a similar pattern over time. They are a great way to see how different variables behave in relation to one anther. But they do not imply that one causes the other.

This is what we mean by correlation is not causation. In fact there are a few different ways to explain a correlation. When X and Y are correlated, either:

  1. X causes Y
  2. Y causes X
  3. X and Y are unrelated
  4. Another variable, A, affects both X and Y

More often than not, X and Y are unrelated. The simple juxtaposition of the variables visually makes us want to believe there is a relationship. An honest data visualization will avoid showing these false relationships, only posting falsely similar trends side-by-side when there is a reasonable explanation, or when he/she can explain that there is no correlation.

Take a look at this chart, for example, from TylerVigen.com. Is shows the correlation between two clearly unrelated trends: mozzarella cheese consumption and deaths by bedsheet entanglement:

We should always pay attention to data visualizations when we two variables correlated as such.

Similarities now do not mean similarities forever

On top of false correlations, a relationship happening within our observational set does not imply a relationship always. In other words, if we establish a causal relationship (more than correlational relationship) for a period of a month, this does not mean that relationship continues outside of the month.

For example, imagine we show a direct causal relationship between % change in small town income and the presence of that town’s college football home games. Take a look at this chart:

Football game impact on change in income

And now take a look at the full year:

Football game impact on change in income, full year

Clearly, the causal relationship we outlined (i.e that home football games increase change in income) is a possible causal relationship. Let me be clear, this chart does not prove that there is not a causal relationship. It simply shows that a perceived relationship in the present does not exist always.

A possible explanation for why change in income increases even in the football offseason is that basketball games take up the slack. If this is the case, our causal relationship is intact. BUT, the similarity we see in the on season does not mean there is always a relationship.

Axes make all the difference

The size and number of axes can change the way we interpret data. We already talked about how the size of an axis can change the way we interpret its data with high school diploma obtention rates under president Obama. But what about the number of axes used? We can’t mention the pros and cons of data visualization without mentioning axes!

Let’s take a look again at graph, “Football game impact on change in income.” If you look carefully, you will see that the y-axis has two different increments. On the left side, you will see the axis in integers, whereas on the right, you will see it in percentage.

Showing dual axes on the football game impact graph
Showing dual axes on the football game impact graph

The reason there are two axes is that the input variables (number of football games, and % change in income) are in two different units. Because we’re only looking for similarities in the trends and NOT in absolute similarities, this is acceptable.

But we must be carful not to confuse axes or misinterpret their meanings. If we tried to read the percent change in income based on the left y-axis numbers, we would confuse the magnitude by about 10 times (4.5 * 10 = 45).

In other words, axes can be misleading in two ways. First, the size of the axis can cause the viewer to assume changes are more extreme than the reality. Second, dual axes can trick us into assuming two variables are calculated on the same base unit.

Abusing the Law of Large Numbers (LLN)

The law of large numbers is the idea that only large samples can provide trustworthy results. Formally, you might hear it defined as the tendency for large data sets to more accurately represent reality.

What does this mean for data visualizations? It means that any dat viz based on a sample size that’s too small is just as biased as the sample size.

A disadvantage of data visualization with regards to small sample sizes is that the viewer doesn’t see the sample size. For example, imagine you flip a coin 5 times. It comes up as head 4 of those 5 times. Ultimately, you could say that coins show heads 80% f the time (4 ÷ 5).

But this would not be representative. You would need to flip the coin 100 times as a minimum. The law of large numbers tells us that the larger is our sample set, the larger our predictive power will be. After 100 times, you might see that heads shows up 51 times, or 51%.

But a data visualization doesn’t show the source. It only shows results. I could portray the 80% result with a bar chart like this one, and most people would not look twice:

heads vs tails

Seasonality kills

Seasonality can be just as dangerous as relationships that don’t last that we discussed above. In a sentence, seasonality is the tendency for a trend to repeat itself on a periodic basis, whether by days, weeks, months, or years.

Our example of the football fields above is a good representation for seasonality. The difference between seasonality and other time-related limitation in data visualization is that seasonality is NOT limited in time. It is only reflected in time.

In other words, you may find a strong correlation between cheddar cheese consumption and missed free-throws in high school basketball games. Perhaps we see a correlation during the winter time between the two that drops off in other seasons. Perhaps we see that correlation for a period of 3 years. Then there is no more correlation at all.

In this case, we observe both seasonality AND non-permanent correlations.

IHME data visualization

A really good example of an organization that uses multiple data visualizations, as well as interactive data visualizations, is the IHME. Take a look at these charts on child mortality rates.

“Mean,” or average, is not the best go-to statistic

The “mean” or average of a list of data is not a good all-purpose statistic. This is because outliers can skew the data in one direction or the other. As you probably know, most data sets that are unrelated to time follow a normal distribution. In a normal distribution, the mean, median, and mode are all the same.

However, most datasets deviate from the standard bell curve to a minimal degree at the very least. Any data points that sit far away from the others, a.k.a. whose deviation from the mean is extreme, will skew the average much more than they will skew the median.

Without going too deep in to the statistics, this is why you should always look for both mean and median in a data visualization. If there is a big difference, then the visualization may not be representative.

Standardizing the benchmark is critical

A list of the pros and cons of data analysis aren’t complete without reference to standardizing benchmarks. One of the most common ways data visualizations are used to mislead is through the use of unequal benchmarks.

By unequal benchmarks I mean the use of different reference points to demonstrate progress. Think about it as if you were running a 100 meter race, while a competitor only needs to run the last 50 meters. You both start at the same time, and you are judged equally. It’s simply not fair. The benchmarks are unequal.

Sometimes data visualizations do this with percentages. You might see a graph, for example, that shows 150% year-on-year growth in revenues for the first three years for company 1, whereas company 2 only has 10% growth during that same period.

What they don’t explain is that company 1 started with zero revenues. Even $1.50 dollars in revenue is more than 100% growth from 0! Meanwhile, company 2 is making 10% more on 1 million in revenues, which is $100,000 more.

Which one would you choose to invest it? The one with 150% growth, or the one with $100,000 growth? It’s all about the correct reference, or benchmark.

Conclusion: pros and cons of data visualization

In conclusion, the pros and cons of data visualization hinge around the use of misleading data techniques. While data viz is an incredible tool for communication, it can also be used to spread misinformation. How does it do that? Here are the list of pros and cons one more time:

Pros

  1. Easy to communicate
  2. Earn attention
  3. Adds credibility
  4. Easy to remember
  5. Enhancing the message

Cons

  1. Correlation is not causation
  2. Similarities now don’t mean similarities forever
  3. Y- and X-Axes can reverse meaning
  4. Abusing the Law of Large Numbers (LLN)
  5. Seasonality kills
  6. “Mean,” or average, is not the best go-to statistic
  7. Standardizing the benchmark is critical