content analysis

Qualitative Content Analysis: a Simple Guide with Examples

Content analysis is a type of qualitative research (as opposed to quantitative research) that focuses on analyzing content in various mediums, the most common of which is written words in documents.

It’s a very common technique used in academia, especially for students working on theses and dissertations, but here we’re going to talk about how companies can use qualitative content analysis to improve their processes and increase revenue.

Whether you’re new to content analysis or a seasoned professor, this article provides all you need to know about how data analysts use content analysis to improve their business. It will also help you understand the relationship between content analysis and natural language processing — what some even call natural language content analysis.

Don’t forget, you can get the free Intro to Data Analysis eBook, which will ensure you build the right practical skills for success in your analytical endeavors.

What is qualitative content analysis, and what is it used for?

Any content analysis definition must consist of at least these three things: qualitative language, themes, and quantification.

In short, content analysis is the process of examining preselected words in video, audio, or written mediums and their context to identify themes, then quantifying them for statistical analysis in order to draw conclusions. More simply, it’s counting how often you see two words close to each other.

For example, let’s say I place in front of you an audio bit, a old video with a static image, and a document with lots of text but no titles or descriptions. At the start, you would have no idea what any of it was about.

Let’s say you transpose the video and audio recordings on paper. Then you use a counting software to count the top ten most used words, excluding prepositions (of, over, to, by) and articles (the, a), conjunctions (and, but, or) and other common words like “very.”

Your results are that the top 5 words are “candy,” “snow,” “cold,” and “sled.” These 5 words appear at least 25 times each, and the next highest word appears only 4 times. You also find that the words “snow” and “sled” appear adjacent to each other 95% of the time that “snow” appears.

Well, now you have performed a very elementary qualitative content analysis.

This means that you’re probably dealing with a text in which snow sleds are important. Snow sleds, thus, become a theme in these documents, which goes to the heart of qualitative content analysis.

The goal of qualitative content analysis is to organize text into a series of themes. This is opposed to quantitative content analysis, which aims to organize the text into categories.

Types of qualitative content analysis

If you’ve heard about content analysis, it was most likely in an academic setting. The term itself is common among PhD students and Masters students writing their dissertations and theses. In that context, the most common type of content analysis is document analysis.

There are many types of content analysis, including:

  • Short- and long-form survey questions
  • Focus group transcripts
  • Interview transcripts
  • Legislature
  • Journals
  • Magazines
  • Public records
  • Newspapers
  • Textbooks
  • Cookbooks
  • Comments sections
  • Messaging platforms

This list gives you an idea for the possibilities and industries in which qualitative content analysis can be applied.

For example, marketing departments or public relations groups in major corporations might collect survey, focus groups, and interviews, then hand off the information to a data analyst who performs the content analysis.

A political analysis institution or Think Tank might look at legislature over time to identify potential emerging themes based on their slow introduction into policy margins. Perhaps it’s possible to identify certain beliefs in the senate and house of representatives before they enter the public discourse.

Non-governmental organizations (NGOs) might perform an analysis on public records to see how to better serve their constituents. If they have access to public records, it would be possible to identify citizen characteristics that align with their goal.

Analysis logic: inductive vs deductive

There are two types of logic we can apply to qualitative content analysis: inductive and deductive. Inductive content analysis is more of an exploratory approach. We don’t know what patterns or ideas we’ll discover, so we go in with an open mind.

On the other hand, deductive content analysis involves starting with an idea and identifying how it appears in the text. For example, we may approach legislation on wildlife by looking for rules on hunting. Perhaps we think hunting with a knife is too dangerous, and we want to identify trends in the text.

Neither one is better per se, and they each have carry value in different contexts. For example, inductive content analysis is advantageous in situations where we want to identify author intent. Going in with a hypothesis can bias the way we look at the data, so the inductive method is better

Deductive content analysis is better when we want to target a term. For example, if we want to see how important knife hunting is in the legislation, we’re doing deductive content analysis.

Measurements: idea coding vs word frequency

Two main methodologies exist for analyzing the text itself: coding and word frequency. Idea coding is the manual process of reading through a text and “coding” ideas in a column on the right. The reason we call this coding is because we take ideas and themes expressed in many words, and turn them into one common phrase. This allows researchers to better understand how those ideas evolve. We will look at how to do this in word below.

In short, coding in the context qualitative content analysis follows 2 steps:

  1. Reading through the text one time
  2. Adding 2-5 word summaries each time a significant theme or idea appears

Word frequency is simply counting the number of times a word appears in a text, as well as its proximity to other words. In our “snow sled” example above, we counted the number of times a word appeared, as well as how often it appeared next to other words. There’s are online tool for this we’ll look at below.

In short, word frequency in the context of content analysis follows 2 steps:

  1. Decide whether you want to find a word, or just look at the most common words
  2. Use word’s Replace function for the first, or an online tool such as Text Analyzer for the second (we’ll look at these in more detail below).

Many data scientists consider coding as the only qualitative content analysis, since word frequency turns to counting the number of times a word appears, making is quantitative.

While there is merit to this claim, I personally do not consider word frequency a part of quantitative content analysis. The fact that we count the frequency of a word does not mean we can draw direct conclusions from it. In fact, without a researcher to provide context on the number of time a word appears, word frequency is useless. True quantitative research carries conclusive value on its own.

Measurements AND analysis logic

There are four ways to approach qualitative content analysis given our two measurement types and inductive/deductive logical approaches. You could do inductive coding, inductive word frequency, deductive coding, and deductive word frequency.

The two best are inductive coding and deductive word frequency. If you would like to discover a document, trying to search for specific words will not inform you about its contents, so inductive word frequency is un-insightful.

Likewise, if you’re looking for the presence of a specific idea, you do not want to go through the whole document to code just to find it, so deductive coding is not insightful. Here’s simple matrix to illustrate:

Inductive (discovery)Deductive (locating)
Coding (summarizing ideas)GOOD. (Example: discovering author intent in a passage.)BAD. (Example: coding an entire document to locate one idea.)
Word frequency (counting word occurrences)OK. (Example: trying to understand author intent by pulling to 10% of words.)GOOD. (Example: locating and comparing a specific term in a text.)
Matrix of measurement types and logical approaches in content analysis

Qualitative content analysis example

We looked at a small example above, but let’s play out all of the above information in a real world example. I will post the link to the text source at the bottom of the article, but don’t look at it yet. Let’s jump in with a discovery mentality, meaning let’s use an inductive approach and code our way through each paragraph.

Qualitative Content Analysis Example Download

*Click the “1” superscript to the right for a link to the source text.1

How to do qualitative content analysis

We could use word frequency analysis to find out which are the most common x% of words in the text (deductive word frequency), but this takes some time because we need to build a formula that excludes words that are common but that don’t have any value (a, the, but, and, etc).

As a shortcut, you can use online tools such as Text Analyzer and WordCounter, which will give you breakdowns by phrase length (6 words, 5 words, 4 words, etc), without excluding common terms. Here are a few insightful example using our text with 7 words:

7 word strings, inductive word frequency, content analysis

Perhaps more insightfully, here is a list of 5 word combinations, which are much more common:

5 word strings, inductive word frequency, content analysis

The downside to these tools is that you cannot find 2- and 1-word strings without excluding common words. This is a limitation, but it’s unlikely that the work required to get there is worth the value it brings.

OK. Now that we’ve seen how to go about coding our text into quantifiable data, let’s look at the deductive approach and try to figure out if the text contains a single word we’re looking for. (This is my favorite.)

Deductive word frequency

We know the text now because we’ve already looked through it. It’s about the process of becoming literate, namely, the elements that impact our ability to learn to read. But we only looked at the first four sections of the article, so there’s more to explore.

Let’s say we want to know how a household situation might impact a student’s ability to read. Instead of coding the entire article, we can simply look for this term and it’s synonyms. The process for deductive word frequency is the following:

  1. Identify your term
  2. Think of all the possible synonyms
  3. Use the word find function to see how many times they appear
  4. If you suspect that this word often comes in connection with others, try searching for both of them

In my example, the process would be:

  1. Household
  2. Parents, parent, home, house, household situation, household influence, parental, parental situation, at home, home situation
  3. Go to “Edit>Find>Replace…” This will enable you to locate the number of instances in which your word or combinations appear. We use the Replace window instead of the simply Find bar because it allows us to visualize the information.
  4. Accounted for in possible synonyms

The results: 0! None of these words appeared in the text, so we can conclude that this text has nothing to do with a child’s home life and its impact on his/her ability to learn to read. Here’s a picture:

deductive word frequency content analysis

Don’t Be Afraid of Content Analysis

Content analysis can be intimidating because it uses data analysis to quantify words. This article provides a starting point for your analysis, but to ensure you get 90% reliability in word coding, sign up to receive our eBook Beginner Content Analysis. I went from philosophy student to a data-heavy finance career, and I created it to cater to research and dissertation use cases.

Content analysis vs natural language processing

While similar, content analysis, even the deductive word frequency approach, and natural language processing (NLP) are not the same. The relationship is hierarchical. Natural language processing is a field of linguistics and data science that’s concerned with understanding the meaning behind language.

On the other hand, content analysis is a branch of natural language processing that focuses on the methodologies we discussed above: discovery-style coding (sometimes called “tokenization”) and word frequency (sometimes called the “bag of words” technique)

For example, we would use natural language processing to quantify huge amounts of linguistic information, turn it into row-and-column data, and run tests on it. NLP is incredibly complex in the details, which is why it’s nearly impossible to provide a synopsis or example technique here (we’ll provide them in coursework on AnalystAnswers.com). However, content analysis only focuses on a few manual techniques.

Content analysis in marketing

Content analysis in marketing is the use of content analysis to improve marketing reach and conversions. has grown in importance over the past ten years. As digital platforms become more central to our understanding and interaction with others, we use them more.

We write out ideas, small texts. We post our thoughts on Facebook and Twitter, and we write blog posts like this one. But we also post videos on youtube and express ourselves in podcasts.

All of these mediums contain valuable information about who we are and what we might want to buy. A good marketer aims to leverage this information in three ways:

  1. Collect the data
  2. Analyze the data
  3. Modify his/her marketing messaging to better serve the consumer
  4. Pretend, with bots or employees, to be a consumer and craft messages that influence potential buyers

The challenge for marketers doing this is getting the rights to access this data. Indeed, data privacy laws have gone into play in the European Union (General Data Protection Regulation, or GDPR) as well as in Brazil (General Data Protection Law, or GDPL).

Content analysis vs narrative analysis

Content analysis is concerned with themes and ideas, whereas narrative analysis is concerned with the stories people express about themselves or others. Narrative analysis uses the same tools as content analysis, namely coding (or tokenization) and word frequency, but its focus is on narrative relationship rather than themes. This is easier to understand with an example. Let’s look at how we might code the following paragraph from the two perspectives:

I do not like green eggs and ham. I do not like them, Sam-I-Am. I do not like them here or there. I do not like them anywhere!

Content analysis: the ideas expressed include green eggs and ham. the narrator does not like them

Narrative analysis: the narrator speaks from first person. He has a relationship with Sam-I-Am. He orients himself with regards to time and space. he does not like green eggs and ham, and may be willing to act on that feeling.

Content analysis vs document analysis

Content analysis and document analysis are very similar, which explains why many people use them interchangeably. The core difference is that content analysis examines all mediums in which words appear, whereas document analysis only examines written documents.

For example, if I want to carry out content analysis on a master’s thesis in education, I would consult documents, videos, and audio files. I may transcribe the video and audio files into a document, but I wouldn’t exclude them form the beginning.

On the other hand, if I want to carry out document analysis on a master’s thesis, I would only use documents, excluding the other mediums from the start. The methodology is the same, but the scope is different. This dichotomy also explains why most academic researchers performing qualitative content analysis refer to the process as “document analysis.” They rarely look at other mediums.

Content Gap Analysis

Content gap analysis is a term common in the field of content marketing, but it applies to the analytical fields as well. In a sentence, content gap analysis is the process of examining a document or text and identifying the missing pieces, or “gap,” that it needs to be completed.

As you can imagine, a content marketer uses gap analysis to determine how to improve blog content. An analyst uses it for other reasons. For example, he/she may have a standard for documents that merit analysis. If a document does not meet the criteria, it must be rejected until it’s improved.

The key message here is that content gap analysis is not content analysis. It’s a way of measuring the distance an underperforming document is from an acceptable document. It is sometimes, but not always, used in a qualitative content analysis context.

  1. Link to Source Text []

About the Author

Noah

Noah is the founder & Editor-in-Chief at AnalystAnswers. He is a transatlantic professional and entrepreneur with 5+ years of corporate finance and data analytics experience, as well as 3+ years in consumer financial products and business software. He started AnalystAnswers to provide aspiring professionals with accessible explanations of otherwise dense finance and data concepts. Noah believes everyone can benefit from an analytical mindset in growing digital world. When he's not busy at work, Noah likes to explore new European cities, exercise, and spend time with friends and family.

LinkedIn

Scroll to Top