Principles, Statistical and Computational Tools for Reproducible Data Science

A survey of best data science practices.

Join Harvard faculty in this online course to learn skills and tools that support data science and reproducible research.

Featuring faculty from:
Self-Paced
Length
8 weeks
3-8 hours per week
Certificate Price
$149
Program Dates
Start Principles, Statistical and Computational Tools today.

What You'll Learn

Today the principles and techniques of reproducible research are more important than ever, across diverse disciplines from astrophysics to political science. No one wants to do research that can’t be reproduced. Thus, this course is really for anyone who is doing any data intensive research. While many of us come from a biomedical background, this course is for a broad audience of data scientists.

To meet the needs of the scientific community, this course will examine the fundamentals of methods and tools for reproducible research. Led by experienced faculty from the Harvard T.H. Chan School of Public Health, you will participate in six modules that will include several case studies that illustrate the significant impact of reproducible research methods on scientific discovery.

This course will appeal to students and professionals in biostatistics, computational biology, bioinformatics, and data science. The course content will blend video lectures, case studies, peer-to-peer engagements and use of computational tools and platforms (such as R/RStudio, and Git/Github), culminating in a final presentation of a final reproducible research project.

We’ll cover Fundamentals of Reproducible Science; Case Studies; Data Provenance; Statistical Methods for Reproducible Science; Computational Tools for Reproducible Science; and Reproducible Reporting Science. These concepts are intended to translate to fields throughout the data sciences: physical and life sciences, applied mathematics and statistics, and computing.

Consider this course a survey of best practices: we’d like to make you aware of pitfalls in reproducible data science, some failure - and success - stories in the past, and tools and design patterns that might help make it all easier. But ultimately it’ll be up to you to take the skills you learn from this course to create your own environment in which you can easily carry out reproducible research, and to encourage and integrate with similar environments for your collaborators and colleagues. We look forward to seeing you in this course and the research you do in the future!

The course will be delivered via edX and connect learners around the world. By the end of the course, participants will understand:

  • Understand a series of concepts, thought patterns, analysis paradigms, and computational and statistical tools, that together support data science and reproducible research.
  • Fundamentals of reproducible science using case studies that illustrate various practices
  • Key elements for ensuring data provenance and reproducible experimental design
  • Statistical methods for reproducible data analysis
  • Computational tools for reproducible data analysis and version control (Git/GitHub, Emacs/RStudio/Spyder), reproducible data (Data repositories/Dataverse) and reproducible dynamic report generation (Rmarkdown/R Notebook/Jupyter/Pandoc), and workflows.
  • How to develop new methods and tools for reproducible research and reporting
  • How to write your own reproducible paper.

Your Instructors

Image
Curtis Huttenhower

Curtis Huttenhower

Associate Professor of Computational Biology and Bioinformatics at Harvard University
Read full bio.

Image
John Quackenbush

John Quackenbush

Professor of Computational Biology and Bioinformatics at Harvard University
Read full bio. 

Image
Lorenzo Trippa

Lorenzo Trippa

Associate Professor of Biostatistics at Harvard University
Read full bio.

Ways to take this course

When you enroll in this course, you will have the option of pursuing a Verified Certificate or Auditing the Course.

A Verified Certificate costs $149 and provides unlimited access to full course materials, activities, tests, and forums. At the end of the course, learners who earn a passing grade can receive a certificate. 

Alternatively, learners can Audit the course for free and have access to select course material, activities, tests, and forums. Please note that this track does not offer a certificate for learners who earn a passing grade.

Related Courses

Read More

Introduction to Data Science with Python

Learn Python for data analysis

Join Harvard University Professor Pavlos Protopapas, in this online course to learn how to use Python to harness and analyze data.

Read More

Data Science Professional Certificate

Real-world data science skills to jumpstart your career

The HarvardX Data Science program prepares you with the necessary knowledge base and useful skills to tackle real-world data analysis challenges.

Read More

Introduction to Data Wise: A Collaborative Process to Improve Learning & Teaching

Build a collaborative culture

Join Harvard faculty in this online course to learn what is involved in using data wisely to build a culture of collaborative inquiry.