In this tutorial, I will show you how to perform a Pearson correlation test in R.
What is a Pearson correlation test?
A Pearson correlation test is a parametric, statistical test to determine the linear correlation between two variables.
For this tutorial, I will use the trees dataset that is already available within R.
The trees dataset contains measurements from 31 cherry trees. Specifically, the data frame contains three variables:
- Girth – the tree diameter (in inches)
- Height – the tree height (in feet)
- Volume – the volume of timber (in cubic feet)
In this example, I am interested in the correlation between tree girth and height.
Based on the above example data, here are my two hypotheses:
- Null hypothesis – There is no significant correlation between tree girth and height
- The alternative hypothesis – There is a significant correlation between tree girth and height
I will also set my alpha level to 0.05.
How to perform a Pearson correlation test in R
It is really easy to perform a Pearson correlation test in R. There are no additional package requirements; the correlation function is part of the standard R platform.
If you’re interested in learning more about performing correlations in R, then check out DataCamp’s interactive Correlation and Regression in R online course.
Step 1: Import your data into R
The first step to perform a Pearson correlation in R is that you need some data containing the two variables of interest.
In this example, I will be using the trees dataset in R.
To load the trees dataset, simply run the following code.
#Load the trees dataset data(trees)
You should now see the tree dataset in the environment.
Step 2: Perform the Pearson correlation test
To perform the Pearson correlation test, use the cor.test function.
By default, the cor.test function performs a two-sided Pearson correlation test.
The cor.test function requires two inputs: x and y. These are the two variables that you want to correlate in the Pearson correlation.
The code to run the Pearson correlation in R is displayed below. Simply replace x and y with the names of the two variables.
#Run the Pearson correlation test ##Replace x and y with the two variables cor.test(x, y, method = "pearson")
By using my example, I am interested in the correlation between the girth and height variables in the trees dataset. So, my code will look like the following.
#Pearson correlation test using the trees dataset cor.test(trees$Girth, trees$Height, method = "pearson")
Additional settings of interest
There are some additional arguments that you can change in the cor.test function. Some of the main ones you may be interested in are defined below.
- conf.level – change the confidence level (default is 0.95)
- Numeric value between 0 and 1
For example, if you want to run a Pearson correlation test with a confidence level of 0.90, then enter the following.
#Pearson correlation test with 0.90 confidence level cor.test(x, y, method = "pearson", conf.level = 0.90)
- alternative – change the alternative hypothesis (default is “two.sided”)
- “two.sided” – non-zero
- “greater” – greater than zero (ie, positive correlation)
- “less” – less than zero (ie, negative correlation)
For example, if you wanted to run a one-sided Pearson correlation test with the alternative hypothesis describing a positive association, then enter the following.
#One-sided (positive association) Pearson correlation test cor.test(x, y, method = "pearson", alternative = "greater")
Interpretation of results
The output of my example is displayed below.
Pearson's product-moment correlation data: trees$Girth and trees$Height t = 3.2722, df = 29, p-value = 0.002758 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.2021327 0.7378538 sample estimates: cor 0.5192801
There are a few parameters returned in the results of the Pearson correlation test. These are summarized below.
- data – the two variables in the test
- t – the t-statistic
- df – the degrees of freedom
- p-value – the p-value for the Pearson correlation test
- alternative hypothesis – a description of the alternative hypothesis
- 95 percent confidence interval – the 95% confidence intervals
- sample estimates – the Pearson correlation coefficient
So, by looking at my example output, the Pearson correlation coefficient is 0.52.
The Pearson correlation coefficient is a value that ranges from -1 to 1. The major cut-offs are:
- -1 – a perfectly negative association between the two variables
- 0 – no association between the two variables
- 1 – a perfectly positive association between the two variables
Since the coefficient value is positive, this means that there is a positive correlation between the variables girth and height. In other words, as the girth increases, so does the height.
You can also see that the p-value is 0.002758.
Since this p-value is below my alpha level (0.05), I will reject the null hypothesis and accept the alternative hypothesis. In other words, there is a significant (positive) correlation between the girth and height of the cherry trees.
I have shown you how to perform a Pearson correlation test in R. This can easily be achieved with the cor.test function; no other packages are required.
R version used: 3.6.3
R Studio version used: 1.2.5033