Data Science: Wrangling

Turn raw data into useful data

In this online course taught by Harvard Professor Rafael Irizarry, learn to process and convert raw data into formats needed for analysis.

Featuring faculty from:
Self-Paced
Length
8 weeks
1-2 hours a week
Certificate Price
$149
Program Dates
Self-Paced
Length
8 weeks
1-2 hours a week
Certificate Price
$149
Program Dates
Start Data Science: Wrangling Today

What You'll Learn

As part of our Professional Certificate Program in Data Science, we cover several standard steps of the data wrangling process like importing data into R, tidying data, string processing, HTML parsing, working with dates and times, and text mining. Rarely are all these wrangling steps necessary in a single analysis, but a data scientist will likely face them all at some point.

Very rarely is data easily accessible in a data science project. It's more likely for the data to be in a file, a database, or extracted from documents such as web pages, tweets, or PDFs. In these cases, the first step is to import the data into R and tidy the data, using the tidyverse package. The steps that convert data from its raw form to the tidy form is called data wrangling.

This process is a critical step for any data scientist. Knowing how to wrangle and clean data will enable you to make critical insights that would otherwise be hidden.

The course will be delivered via edX and connect learners around the world. By the end of the course, participants will understand the following concepts:

  • Importing data into R from different file formats
  • Web scraping
  • How to tidy data using the tidyverse to better facilitate analysis
  • String processing with regular expressions (regex)
  • Wrangling data using dplyr
  • How to work with dates and times as file formats, and text mining

Your Instructors

Image
Rafael Irizarry

Rafael Irizarry

Professor of Biostatistics at Harvard University
Read full bio.

Ways to take this course

When you enroll in this course, you will have the option of pursuing a Verified Certificate or Auditing the Course.

A Verified Certificate costs $149 and provides unlimited access to full course materials, activities, tests, and forums. At the end of the course, learners who earn a passing grade can receive a certificate. 

Alternatively, learners can Audit the course for free and have access to select course material, activities, tests, and forums. Please note that this track does not offer a certificate for learners who earn a passing grade.

Read More

Introduction to Linear Models and Matrix Algebra

Perform matrix operations

Learn to use R programming to apply linear models to analyze data in life sciences.

Read More

Data Science: Inference and Modeling

Key concepts through a motivating case study

Learn inference and modeling: two of the most widely used statistical tools in data analysis.

Read More

Data Science: Capstone

To become an expert you need practice and experience.

Show what you’ve learned from the Professional Certificate Program in Data Science.