man performing data analysts

How to Become a Data Analyst in 7 Steps (With or Without a Degree)

Fifteen years ago, most companies didn’t hire data analysts, and only a small number of universities offered degrees programs for it. But in the past decade, the data analyst position has grown in popularity, with advances in programming languages and educational programs available each year.

On the other hand, many high caliber data analysts don’t have degrees at all, which makes data analysis more like a technology job than a traditional one. This begs the question: if not through a specific degree, how do you become a data analyst?

In a sentence, to become a data analyst you need to demonstrate your talents with a portfolio of work, with the occasional test or certification. While a degree will add credibility to your profile, data analysis is about getting your hands dirty in data to prove you can extract insights from data tables.

How to Become a Data Analyst in 7 Steps

In other words, if you want to become a data analyst, you’ll need to show on you can take raw data and create meaning from it. And how do analysts this? They learn a few basic skills and then practice, practice, practice.

You need to be able to show your ability to do so by building up a portfolio and/or getting the right certifications (there are lots of scam certifications on the market that you’ll want to avoid, so I’ll show you the good ones later).

Now, I’m going to provide you with the 7 steps you can use to learn the basic skills you’ll need AND build an introductory portfolio at the same time. You’ll perform 2 analyses throughout the 7 steps, which you’ll use to apply to 2 jobs positions throughout the process.

But before we move into the 7 steps, I want to specify that you should not feel overwhelmed! The great thing about data analysis is that you need to learn some basic skills to get started, but most of what it takes to become a good data analyst comes with experience.

In other words, there’s a steep learning curve, so drill down hard here to get started, then you’ll land a job where you can start making big improvements. With that said, here are the steps to becoming a data analyst.

  1. Get familiar with data analysis jargon
  2. Start with Excel, with an eye towards Tableau
  3. Start working with pivot tables (perform portfolio analysis #1 – regression, simple and multi-variable)
  4. Participate in data visualization competitions
  5. Run analysis on non-profit data sets (perform portfolio analysis #2 – time series)
  6. Learn basic SQL
  7. Learn basic R (or Python)

Step 1: Get familiar with data analysis

Explanation. Spend some time reading through articles online about data projects that others have done. It will help you get a feel for the goals and challenges associated with different in data analysis. Learn about normalization, cluster analysis, and tools like Tableau and Excel. We’ve got an article for beginner’s called What does a data analyst do? that you can check out to get started, and I encourage you to read from different resources as you learn new terminology.

Every analyst has a different approach to problem solving. That’s one of the beautiful aspects of data analysis — it’s not about using one-size-fits-all solutions to recurring challenges. There’s an enormous amount of creativity involved in the analytical process. This dynamic is also what makes practice so important — you need to see what methodologies work best for you.

Example

To get started you can check out our Data Analyst Resources page. As of today (August 2020), I’m actively working to build some courses to help you understand data analysis terminology. Either way, read through the articles on that page. Even if you don’t understand everything from the start, it will help you get familiar with terminology you’ll need down the road. For example, you should learn these key 10 data buzzwords:

  • Observation – the object being observed in data collection. For example, in collecting data about student hight and weight, each student is an observation, while height and weight are characteristics of the observation
  • Normalization – structuring data tables such that the left column holds an observation containing characteristics of all following columns
  • Clustering – regrouping observations based on common characteristics
  • Regression – minimizing the distance between observations and a best fit curve on a graph
  • SQL, Tableau, and R – data query, visualization, and statistical languages, respectively
  • NULL values – empty characteristic cells of an observation
  • Data dictionary – data that explains data, also called metadata
  • Dataviz – short for data visualization, whose purpose is to display insights in a user-friendly way
  • Data requirements – requirements gathered before data collection that analysts use to collect the right data
  • Data types – the kinds of data that programs like Excel and Tableau understand, including date, number, text, integer, and many more

Step 2: Start with Excel, with an eye on Tableau

Explanation. Before the invention of digital databases, data analysis wasn’t possible. Pen and paper simply aren’t quick or flexible enough to work as databases. Although there were technical programs before spreadsheets, the invention of Microsoft Excel made data analysis possible at a large scale. It’s been dominating the market for over 20 years, and for good reason.

Microsoft Excel (and other spreadsheet technologies such as Google Sheets, Apple Number, and LibreOffice) allows you to add, delete, push, pull, replace, calculate, and manipulate data in two directions with a host of different formulas and formats. It’s the ultimate playground for data analysis.

The reason I suggest Microsoft Excel over Google Sheets and Apple Numbers is twofold: first, as you improve, you will want to start learning shortcuts on the keyboard to speed up your analysis. These shortcuts are either inefficient or unavailable on the other two programs.

Second, most companies have Excel engrained into their technical environments. You don’t want to be the person who shows up more comfortable in Google Sheets or Apple Numbers, who always has formatting programs when working in teams.

All of these programs struggle to maintain good formatting across generations of their own product, much less to a competitor product. Perhaps if you just wanted a simple datatable of 10 rows by 10 columns, it could work. But once you deal with real databases, you’re going to run into problems. Try to stick with Microsoft Excel.

It can be a bit pricy, so if you want to go with a free alternative such as LibreOffice or Google Sheets to start with, don’t feel like you can’t. In fact, LibreOffice is a really good alternative. But in the long run, you should be prepared to make the switch to Microsoft Excel in a professional setting.

Excel’s shortcomings

With all of that said about how good Excel is, you might think it has no faults! This is certainly not the case. While Excel is the best of the spreadsheet-style software, it’s not strong enough to effectively manage huge amounts of rows and columns (it usually stops working after several 10s of thousands of each). Another limitation is the data visualization possibilities. Excel has fewer chart options than Tableau.

There are higher-performing softwares such as SQL and Tableau, which, when combined, will take your data analysis skills to the next level. Do NOT try to learn these two straight away. You’ll need to get comfortable with Excel, then start using Tableau, then finally begin learning SQL. Otherwise, you may get frustrated.

The most approachable of the two is Tableau. Tableau is a data visualization tool based on the same logic as Excel, just more powerful. We’ll look at some examples below.

Example

Imagine you work for a space exploration company, and your boss has handed you a data table with 5 planets (your observations), which each of 3 characteristics — temperature range, water capacity, and quantity of oxygen. Management wants you to prepare a small presentation of how apt these planets are for human life.

First we need to ask ourselves the question: what do we need to understand in order to fulfill the task? In this case it’s easy. We need to know which countries have temperatures that humans can live in, that have enough water for us to drink, and that have enough oxygen for us to breath. Let me show you how convenient spreadsheets really are for this kind of analysis. Here’s our sample dataset:

I simply used LibreOffice calc spreadsheet to encode the information in a table format so we have a clear view on the information.For this type of exercise, any spreadsheet program is sufficient. Let’s say our boss also tells us that humans can’t live in temperatures lower than 0°F or above 125°F, they need 3000L/mˆ2 of water to maintain critical population mass, and most importantly need 1000g/cmˆ3 of oxygen to breath.

A quick look at our database tells us Turnsat is the best planet for us.We could also use some data visualization tools to help us better conceptualize the data. Let’s take a look at a bar chart:

Bar Chart to Visualize Planet Data

This graph is simple and useful, but it’s functions are limited. For example, if we wanted to build dynamic charts with multiple database sources and connections between characteristics across those databases, Excel would be useless. As a data analyst, you very often receive information from disparate sources and must combine them to extract insights.

To do so, you’ll need to learn about Tableau. To give you an idea of what Tableau can do, look at these visualizations. I built both of them in less than 10 minutes:

Tableau Dashboard on Sugar Consumption
Tableau Sheet showing French Speaking Countries

You can download a free version of Tableau here to start playing with it. I recommend you continue with a spreadsheet software until you’re comfortable enough with manipulating rows and columns using pivot tables (next section) before moving on to Tableau, but don’t wait too long. I will be publishing some more information on Tableau (and how you can use it to convince your colleagues and bosses how smart you are) under the Data Analyst Resources page.

This planets example is just a fun exercise to show how easy spreadsheet programs are to use in managing data. The whole database is laid out in front of you. Spreadsheets are useful for data tables of a reasonable size, but once they grow, you don’t want to have to sift through all of the data to find what you’re looking for — you want to simply query the important information. That’s where SQL comes in handy, and we’ll look at it later on.

Step 3: Start working with pivot tables

Sometimes you want to see a table regrouped so the left column is a characteristic, and not the initial observation. For example, perhaps you want to see planets based on their maximum temperatures instead of on their names. This becomes particularly important when you’ve got a huge database. Well, there’s an easy way: pivot tables. I’m going to use a classic dataset called “cars.”

If we look at the dataset, we notice that it’s normalized based on car make and model. Imagine, however, you’re interested in seeing how many US and Japan cars get 15 MPG. You could simply filter out the lines that don’t fit your criteria, then count up the results, but there’s a better way — one that provides more flexibility and more information — is pivot tables. Here’s a video to show why:

Portfolio Analysis #1: Regressions – Single and Multivariable

It’s time to build the first item for your portfolio. This is important because it will help get your practicing AND prepare you to start applying for jobs right away. Here’s what we’re going to do:

  1. Perform a simple and multivariable regression on our dataset cars
  2. Upload it to a blog or social media profile

Regression Analysis – Single and Multivariable

A regression is another way of saying “a line or curve that best represents the trend in our data.” But don’t be scared. This is not like the math you did in high school where each step was on paper. Whatever software you’re using — whether its Excel or LibreOffice — will be able to do this for you.

Regression lines tell us a few things. First, they show what direction the trend goes. Second, they give an idea of best fit. Third, they can be used to project future values, which is a HUGE topic in data analysis. A single variable regression consists of only one variable + the y-intercept. This makes it a line of best fit.

On the other hand, a multivariable regression can have as many variables as needed. This makes it a polynomial, and means it can be a curve or squiggly line — whatever fits the data.

Single variable regression on cars

The most important thing to remember about regressions is that they serve as a way to compare two or more variables. When we plot them on an x and y axis, they are the x and y. I mention this because some people confuse them with time series graphs, in which the x variable is always time.

But let’s not get into the jargon too much. Here’s what we want to know: what is the relationship between horsepower and miles per gallon (MPG)?

To perform a simple regression of these two variables, let’s exercise our Excel skills and extract the two columns we want to plot against each other. To do so, click on the first cell in the cars row, hold down shift + control, then press the down arrow key. This will highlight the column. The you’ll want to hold down control while you click the first cell in the MPG cell, and repeat the process to highlight MPG. Do the same for Horsepower.

In the end, you should have all three columns highlighted. You’ll then want to copy/paste them in a column to the right of the main data table and delete any columns that show up in between. This will be your comparison table. Here’s a short clip to show you how:

Once you have the columns in place, you need to plot them using a scatter plot. To do so, first highlight your MPG and horsepower columns, NOT the cars column. Then navigate to Insert>Charts>Scatter Plots>Scatter. It’s a button with dots on an x and y chart. The chart title defaults to horsepower in my version of Excel, but yours may be different. Once you’re done with that, it should look like this:

Now that you’ve got your scatter plot, it’s time to add the regression. This is the beautiful part of the process. All you need to do is navigate to the tab “Chart design”>Add chart element>Trendline>Linear. This will display a line on your graph.

To show the formula for your line, right click on the line>Format trendline…>Trendline options (the bar chart)>Display Equation on chart. You can also add the R-squared value, which I will explain below.

What does this all mean?

When you have completed a regression, you should identify 3 bits of information:

  • Is the correlation negative or positive? If your line or curve moves from the top left to the bottom right, then the relationship is inverse. If it moves from the bottom left to the top right, the relationship is direct. For us, this means that there is an inverse relationship between MPG and horsepower. That is, the more powerful the car, the less gas efficient it is.
  • Is the Rˆ2 more than 50%? Rˆ2 or R-squared is a measure of how well the line fits your data points. You can think of it like this: for each data point, Excel tried to put the line as close as possible. The distance between the line and each point is thus an ‘error’. Since some points can be negative, we square all of these distances, then all them together. The actual formula is slightly more complicated, but this explanation is a good starting point. Just remember, Rˆ2 ranges from 0% to 100%, where 100% means your line fully explains the data points. The higher the better.
  • What is the formula? Once you have a formula, you can use it to make predictions. All you need is the data for the x variable. If you plug it into your formula, then you’ll have a pretty good idea of what to expect. This is basic forecasting.

Multivariable regression

We’ve looked at single variable regression, which is a line. Multivariable regressions simply have a reinforcing effect on the line, making it a curve. The best part is you already know how to do a multivariable regression. All of the steps are the same as above, except one. When you go to add your trendline, go to more trendline options, then click polynomial: “Chart design”>Add chart element>Trendline>More Trendline Options…>Polynomial. You can play with it from there.

Start building your portfolio

I suggest you perform the same analyses on a separate dataset so you learn how to do it from scratch. Then you can take that file and upload it to your LinkedIn profile or other online presences. Your portfolio does not need to be a special website designed specifically for this reason. You just need to have a place online where you upload documents in order to show future employers that you know what you’re doing (like LInkedIn).

What’s cool about Tableau Public is that they host all of your projects on their website. We’ll talk more about this later, but it’s just one more reason to prefer Tableau over Excel for data visualization!

If you do want to build a blog, I suggest using WordPress.com. You will always have the wordpress.com surdomain in the free versions, but that’s okay. WordPress is one of the biggest players in this field, so you can simply follow their instructions and build your blog portfolio.

Step 4: Participate in Data Visualization Competitions

As you dig deeper into data analysis, you begin to see that there is a big community of people. One of the really cool ways you can practice on a regular basis and improve your data visualization skills is participating in activities and competitions.

One of my favorites is called MakeoverMonday. It’s a weekly data project in which you pull a very simple data set from a source, perform some basic analysis, submit it either online and on Twitter, and get feedback from the community. What’s more, you can add all of your visualizations to your portfolio!

I highly recommend learning by trail and error, and this is a great way to get started. You can visit the page at MakeoverMonday.co.uk. You will also see that many people use Tableau to perform their visualizations. Don’t feel like you have to jump straight in with Tableau — you can submit Excel visualizations as well. At the same time, don’t feel like you can’t use Tableau at all!

Step 5: Run analysis on non-profit data sets

Another way to start practicing for free and build up your portfolio is running analyses on non-profit data sets. These are typically huge datasets with large amounts of information, which will help you get a feel for what the top of the top are working on. Examples of the organizations operating at this caliber include:

In fact, when I first started exploring data analysis, a supportive colleague pushed me to run an analysis on one of UNICEF’s huge databases. The key was that I didn’t have a going-in strategy. The approach was always about intuition, which meant I simply needed to take what I knew and try to create meaning from it.

It’s a really good way to start practicing. You will definitely get stuck, but that’s okay. The goal here is to learn and get comfortable. That’s why you work with databases outside of your league. When you revisit simpler situations, they will be easier.

Portfolio Analysis #2: Time Series Forecasting

It’s time to work on your second analysis: time series forecasting. As with regressions, you will use past and current data to determine future values. But this time, you will use 1 variable’s change over time, rather than a comparison between two variables.

There are several different types of time series methodologies, but we’re going to stick to the basic moving average. A moving average is an average of previous period values. In other words, it uses comparable past values to predict the future. For example, if a company wants to know what its revenue will be in April, but it only have January through March, it would simply average those months to get the prediction. Pretty simple, right?

It is. And not too simple for big banks either. Most large corporations and banks use simple moving averages like this in their official documentation to shareholders and managers. It is widely accepted as a functional predictive methodology.

However, it’s not always as simple as looking at the most recent data. When the data exhibits seasonality, we have to readjust the months averaged. For example, imagine that the summer month of August experiences a spike each year. If we averaged May, June, and July, our prediction would be incorrect — far from reality. This is called seasonality.

In situations where there is seasonality, you have two choices. Either you smooth out the seasonality, or you keep it. I’m a big proponent of keeping seasonality. If it is part of reality, then it should be part of the model. However, many managers will request smoother graphs. It helps shows the big picture. Let’s look at an example.

Normally, we would need to average only months that correspond to the seasonality of our raw data. For example, we would only average the month of August over the past three years, ignoring more recent months. The trouble with this methodology is that your prediction loosing some credibility because older months less accurately represent reality than more recent ones.

On the other hand, ignoring seasonality would be wholly incorrect. Let’s look at the middle ground in our example: averaging the whole seasonal cycle.

Example

Let’s open up the sunspot time series data you downloaded above. In order to account for seasonality without ignoring more recent months, we’re going to use a moving average of a full seasonal cycle. As you will see in the video below, the cycle for sunspots lasts approximately 10 years. We will verify this using a graph, then complete a cycle to the end of the known years.

Once we reach the end, we will start submitting our moving average data into the model. Once we pass a full cycle into moving average data, we will have built a machine learning algorithm! Check out this video to see it in action:

I suggest you download a different time series data set and perform a similar moving average analysis. DON’T FORGET TO ADD THIS TO THE PORTFOLIO TYPE YOU CHOSE, or to your social media!

Step #6: Learn Basic SQL

If you want to be a data analyst, chances are you really enjoy the creative, analytical aspect of data. One of the coolest things about data analysis is being able to approach a huge source of jumbled information and turn it into insights.

Up to this point, we’ve been talking almost exclusively about using Excel. However, if you really want to get serious about analyzing enormous data basis, Excel just doesn’t have to horsepower to push you forward. You need to learn an intuitive, English-based programming language called SQL. I think you will really like it because it will allow you to pull important information out of the clutter.

SQL is an accessible language. Here are some of the most common commands according to W3School. You see how intuitive they are?

  • SELECT – extracts data from a database
  • UPDATE – updates data in a database
  • DELETE – deletes data from a database
  • INSERT INTO – inserts new data into a database
  • CREATE DATABASE – creates a new database
  • ALTER DATABASE – modifies a database
  • CREATE TABLE – creates a new table
  • ALTER TABLE – modifies a table
  • DROP TABLE – deletes a table
  • CREATE INDEX – creates an index (search key)
  • DROP INDEX – deletes an index

I won’t get too much into the details of SQL syntax here. You can find other resources on the Data Analyst Resources page to learn more about this once you reach this step. However, I do want to show you how to get started with SQL. You’ll need to do the following.

Download and install SQL

  1. Download MySQL Community Server from the Oracle Community Page
    • Make sure you download the right server for your operating system (on Mac, click on the Apple in the top left of your screen, then “About this mac.”)
    • On the following page, you see an option to “Create and account” or “Sign in.” YOU DON’T HAVE TO. If you look just below, you’ll see a button saying “No thanks, just start my download.
    • Follow the setup wizard and you’ll have the Server
  2. Download the MySQL Workbench from the Oracle Community Page
    1. Follow the same steps as above.

Once you have downloaded these two items, you’re home free. But don’t worry about getting started with it yet. It’s enough to have completed the download.

Step #7: Learn the Programming Language R (or Python)

By now you should understand that the most important technologies to a data analyst are Excel, Tableau, and SQL. As discussed above, Excel is a great all-purpose tool, Tableau is for advanced data visualization, and SQL is for queries and managing huge databases.

But we’ve left one out: the statistical programming language R. R is a fantastic tool capable of performing huge calculations in a matter of seconds. Computations that would take Excel hours can be done in less than a second with R. As with SQL, I don’t want to get into the syntax on how to use R, but you should definitely create an RStudio.Cloud account online.

DON’T WORRY — it’s much easier to begin with R than SQL. The reason is that R is a higher-level programming language, which means you need fewer words to get bigger results. This also means it’s less flexible, but when it comes to statistics, flexibility is not a primary concern.

Go ahead and head to the RStudio.Cloud website and create an account. Once you’ve done that, try uploading our Excel sheet by following these steps:

  1. In the left sidebar, click on Your Workspace
  2. In the central pane, click on New Project
  3. You should see three workspaces: one large space on the left and to smaller ones on the right.
  4. In the top right space, look for the button “Import Dataset”
  5. Click it, then click “From Excel”
  6. You will be prompted to upload two packages. Approve them, then wait a moment
  7. You will be redirected to an excel upload window
  8. Click “Browse…” and upload the Excel document from there

You have now uploaded your first database to R and are ready to start performing statistical analysis! Check out more resources on the website to help you learn the skills you need to become a data analyst.

Note: the reason I write “or Python” is because these two languages can perform similar functions. However, Python is what we call a lower-level language, which means it requires more work to get fewer results. This makes it more flexible than R, but much harder to learn. When you’re starting, it’s best to go with R. You will impress your employer and bring value more quickly to the job!

Apply to 2 jobs

It can be very difficult to summon the courage to apply to a job, especially if you have no experience and no degree. But don’t forget! You have performed two types of analysis in this article. With a little practice, you will be ready.

Indeed, the very fact that you made it this far in this article shows that you have a real desire to learn. That’s all it takes to land an initial position. From there, you’ll apply your determination and learn how to become a great data analyst.

Seriously, set a goal to apply to 2 data analyst positions this week. Even if it’s a simple LinkedIn application, you should get over the hump of self doubt. Plus, it will give you a view into job descriptions and what employers are looking for.

Data analyst qualifications

Data analysts come from all kinds of backgrounds and have many different qualifications. With that said, they all have some kind of testament to their analytic ability. Many come from science, technology, engineering, and math backgrounds (or STEM), while many others from computer science or finance. Then again, many study humanities (the author is guilty as charged…).

A better way than all of these is simply to build up your portfolio. You already have two kinds of analysis in your pocket, and you can learn more about how data analysts collect, inspect, clean, and visualize business data. The best qualification is proof of ability. A portfolio will beat out a degree every time.

Data analyst certifications

Data analyst certifications include courses, degrees, and online classes. You can certainly go through the many hours to secure one of the programs and pay upwards of $1000 to do so, and there is definitely value in some of these programs. If getting a big name on your resume or CV is critical in your career, then that’s what you should do.

At AnalystAnswers.com, we believe that anyone who so wishes should understand data. It shouldn’t be something reserved for someone with the right degree and a career path. Everyone should be able to learn these skills and earn a certificate to prove it.

As of today (August 2020), I’m working to provide you with short, consumable courses that provide fundamental skills you can add to your portfolio. In addition to the portfolio, you will earn a certificate of completion for that specific skill. No more overpaying for general coursework — it’s time to get down and dirty with data.

How to become a data analyst without a degree

I think we’ve shown throughout this article that you do not need a special degree to become a data analyst. In fact, most data analysts that I know today never studied data analysis. They all came from a range of fields, including philosophy, statistics, history, biology, and more.

You can become a data analyst without a degree by building up your portfolio as we have discussed above. While having the bare minimum of a bachelor’s degree will probably help you get your foot in the door, a good portfolio will make all the difference.

How to become a data analyst with no experience

Having no experience is not the same as having no degree, but the same principles apply. The only nuance here is that if you have zero connection to any analytical experience — whether its research, a degree, or business — then you may need to sell your profile and portfolio a little more than others.

The best way to do so? Put your portfolio up front when applying to jobs. Before they can even read your degree, show them what you can do!

About the Author

Noah

Noah is the founder & Editor-in-Chief at AnalystAnswers. He is a transatlantic professional and entrepreneur with 5+ years of corporate finance and data analytics experience, as well as 3+ years in consumer financial products and business software. He started AnalystAnswers to provide aspiring professionals with accessible explanations of otherwise dense finance and data concepts. Noah believes everyone can benefit from an analytical mindset in growing digital world. When he's not busy at work, Noah likes to explore new European cities, exercise, and spend time with friends and family.

LinkedIn

Scroll to Top