R for reproducible scientific analysis

Introduction to R for non-programmers using gapminder data.

The goal of this lesson is to teach novice programmers to write modular code and best practices for using R for data analysis. R is commonly used in many scientific disciplines for statistical analysis and its array of third-party packages. We find that many scientists who come to Software Carpentry workshops use R and want to learn more. The emphasis of these materials is to give attendees a strong foundation in the fundamentals of R, and to teach best practices for scientific computing: breaking down analyses into modular units, task automation, and encapsulation.

Note that this workshop will focus on teaching the fundamentals of the programming language R, and will not teach statistical analysis.

A variety of third party packages are used throughout this workshop. These are not necessarily the best, nor are they comprehensive, but they are packages we find useful, and have been chosen primarily for their usability.


Understand that computers store data and instructions (programs, scripts etc.) in files. Files are organised in directories (folders). Know how to access files not in the working directory by specifying the path.

Getting ready

This lesson assumes you have the R, RStudio software installed on your computer.

R can be downloaded here.

RStudio is an environment for developing using R. It can be downloaded here. You will need the Desktop version for your computer.


  1. Introduction to R and RStudio
  2. Project management with RStudio
  3. Seeking help
  4. Data structures
  5. Exploring Data Frames
  6. Subsetting data
  7. Control flow
  8. Creating publication quality graphics
  9. Vectorisation
  10. Functions explained
  11. Writing data
  12. Split-apply-combine
  13. Dataframe manipulation with dplyr
  14. Dataframe manipulation with tidyr
  15. Producing reports with knitr
  16. Performance optimization and parallelization
  17. Best practices for writing good code

Other Resources