Introduction to Data Science with Python

Learn Python for data analysis

Join Harvard University Instructor Pavlos Protopapas in this online course to learn how to use Python to harness and analyze data.

Featuring faculty from:
8 weeks
3-4 hours per week
Certificate Price
Program Dates
Start Data Science with Python today.

What You'll Learn

Every single minute, computers across the world collect millions of gigabytes of data. What can you do to make sense of this mountain of data? How do data scientists use this data for the applications that power our modern world?

Data science is an ever-evolving field, using algorithms and scientific methods to parse complex data sets. Data scientists use a range of programming languages, such as Python and R, to harness and analyze data. This course focuses on using Python in data science. By the end of the course, you’ll have a fundamental understanding of machine learning models and basic concepts around Machine Learning (ML) and Artificial Intelligence (AI). 

Using Python, learners will study regression models (Linear, Multilinear, and Polynomial) and classification models (kNN, Logistic), utilizing popular libraries such as sklearn, Pandas, matplotlib, and numPy. The course will cover key concepts of machine learning such as: picking the right complexity, preventing overfitting, regularization, assessing uncertainty, weighing trade-offs, and model evaluation. Participation in this course will build your confidence in using Python, preparing you for more advanced study in Machine Learning (ML) and Artificial Intelligence (AI), and advancement in your career.
Learners must have a minimum baseline of programming knowledge (preferably in Python) and statistics in order to be successful in this course. Python prerequisites can be met with an introductory Python course offered through CS50’s Introduction to Programming with Python, and statistics prerequisites can be met via Fat Chance or with Stat110 offered through HarvardX.

The course will be delivered via edX and connect learners around the world. By the end of the course, participants will learn:

  • Gain hands-on experience and practice using Python to solve real data science challenges
  • Practice Python coding for modeling, statistics, and storytelling
  • Utilize popular libraries such as Pandas, numPy, matplotlib, and SKLearn
  • Run basic machine learning models using Python, evaluate how those models are performing, and apply those models to real-world problems
  • Build a foundation for the use of Python in machine learning and artificial intelligence, preparing you for future Python study

Your Instructor

Pavlos Protopapas is the Scientific Program Director of the Institute for Applied Computational Science(IACS) at the Harvard John A. Paulson School of Engineering and Applied Sciences. He has had a long and distinguished career as a scientist and data science educator, and currently teaches the CS109 course series for basic and advanced data science at Harvard University, as well as the capstone course (industry-sponsored data science projects) for the IACS master’s program at Harvard. Pavlos has a Ph.D in theoretical physics from the University of Pennsylvania and has focused recently on the use of machine learning and AI in astronomy, and computer science. He was Deputy Director of the National Expandable Clusters Program (NSCP) at the University of Pennsylvania, and was instrumental in creating the Initiative in Innovative Computing (IIC) at Harvard. Pavlos has taught multiple courses on machine learning and computational science at Harvard, and at summer schools, and at programs internationally.

Course Overview

  • Linear Regression
  • Multiple and Polynomial Regression
  • Model Selection and Cross-Validation
  • Bias, Variance, and Hyperparameters
  • Classification and Logistic Regression
  • Multi-logstic Regression and Missingness
  • Bootstrap, Confidence Intervals, and Hypothesis Testing
  • Capstone Project

Ways to take this course

When you enroll in this course, you will have the option of pursuing a Verified Certificate or Auditing the Course.

A Verified Certificate costs $299 and provides unlimited access to full course materials, activities, tests, and forums. At the end of the course, learners who earn a passing grade can receive a certificate. 

Alternatively, learners can Audit the course for free and have access to select course material, activities, tests, and forums. Please note that this track does not offer a certificate for learners who earn a passing grade.

Related Courses

Read More

Data Science Professional Certificate

Real-world data science skills to jumpstart your career

The HarvardX Data Science program prepares you with the necessary knowledge base and useful skills to tackle real-world data analysis challenges.

Read More

Digital Humanities in Practice: From Research Questions to Results

Use data science to enhance your research

Combine literary research with data science to find answers in unexpected ways. Learn basic coding tools to draw insights from thousands of documents at once.

Read More

Data Science for Business

Move beyond the spreadsheet

Designed for managers, this course provides a hands-on approach for demystifying the data science ecosystem and making you a more conscientious consumer of information.