Many of you coming into our master’s programmes will have been taught ‘point-and-click’ statistical packages (the most obvious example being IBM SPSS Statistics). As you will have seen from the overview of our modules, most of them use a statistical software package called R, which we implement through a popular software called RStudio. You need to install both on your computer, but once that is done you will actually work in RStudio exclusively. R will be a very different experience to IBM SPSS Statistics because it is not a point-and-click software but a scripting language. You have to type commands to get things to happen. If you have ever used IBM SPSS Statistics syntax, then it’s a similar idea. However, a lot of students feel scared by this idea because it is quite alien to what they are used to. Fear not though, you are in capable hands. Our core module An Adventure in Statistics is taught by the author of one of the most widely-used textbooks about R (that’s me … 😊). However, it’s worth explaining why we use R and RStudio on our modules.
R is the most widely used data analysis software: In various reports such as these ones by Rexer analytics R is the dominant data analytics software. In the 2015 survey, for example, R was the most used software with 86% or respondents using it. IBM SPSS Statistics came second with 29%. It’s particularly interesting (in the context of your employability after university) that R is the software of choice for data scientists. R is increasingly taught in academia too so although IBM SPSS Statistics is still dominant in academia, the switch to R is happening at a rate of knots. With our excellence in methods teaching we want our students to be ahead of the curve, not behind it.
Reproducable science: You may have heard of the reproducibility crisis in Psychology, where apparently well-established effects have failed to replicate. One consequence of this crisis is a drive towards more reproducible science. Part of that process involves analysis that is transparent, documented and reproducible. Although this can be done with syntax in IBM SPSS Statistics, it can’t be done with point-and-click interfaces. R fits seamlessly into a reproducible workflow. For example, increasingly scientists post their data and analysis code on repositories such as the Open Science Framework and Code Ocean and this will become the norm. If you want a career in science, you are going to have to adopt these practices and R fits with this new ethos of reproducibility.
Cutting edge: Bayesian methods are gathering popularity in the social sciences, and, well, we should be routinely applying robust methods to almost all of our data (because Psych data is messy). R because it is open source and contributed to by statisticians working at the cutting edge of statistical methods keeps pace with these trends. IBM SPSS Statistics has extremely limited Bayesian tools, and implements virtually no robust methods.
One stop shop: Learning R (in most cases) negates the need to use other specialist software (AMOS, MPLUS, comprehensive meta-analysis, etc..). I’ve never found anything I can’t do in R - there’s lots I can’t do in other packages.
RStudio is cool: if you really start getting into RStudio it can do some fantastic stuff. For example, you can integrate code to analyse your data into a document that formats the results and generates a report for you. You can create presentation slides, web applications, websites. RStudio uses a form or markdown, which is a widely used lightweight markup language. Therefore, we are teaching our students a skill that extends beyond statistics and psychology. Using RMarkdown you can write an APA style document within RStudio that analyses your data on the fly and inserts the results into APA style tables. This document was written within RStudio and this entire website was written from within RStudio.
Reducing the gender gap in coding: according to a 2015 report by OECD fewer than 5% of girls and nearly 20% of boys contemplate a career in engineering and computing. This gender gap is not driven by a lack of ability but by girls’ lack of confidence in their mathematic and technical skills. Some data suggest that the gender gap is widening with a decline in women majoring in computer science over the past 30 years Such is the gender gap in coding skills that several initiatives have been developed to encourage girls to code, such as Girls who code. A recent report by Accenture and Girls who code suggests that women are more likely to pursue computing and STEM careers is exposed to coding in contexts that interest them and are exposed to female role models who can code. As such, there is a clear argument that on a programme with a predominantly female intake we have an opportunity to help reduce the gender skills gap by embedding coding skills within a curriculum that interests girls (i.e., psychology).
I’ll post some of my own resources including a link to the package of interactive tutorials that I use to teach An adventure in statistics, but you can also look at these: