How To Perform A Pearson Correlation Test In R

In this tutorial, I will show you how to perform a Pearson correlation test in R.

What is a Pearson correlation test?

A Pearson correlation test is a parametric, statistical test to determine the linear correlation between two variables.

Example data

For this tutorial, I will use the trees dataset that is already available within R.

The trees dataset contains measurements from 31 cherry trees. Specifically, the data frame contains three variables:

  • Girth – the tree diameter (in inches)
  • Height – the tree height (in feet)
  • Volume – the volume of timber (in cubic feet)

In this example, I am interested in the correlation between tree girth and height.

Example hypothesis

Based on the above example data, here are my two hypotheses:

  • Null hypothesis – There is no significant correlation between tree girth and height
  • Alternative hypothesis – There is a significant correlation between tree girth and height

I will also set my alpha level to 0.05.

How to perform a Pearson correlation test in R

It is really easy to perform a Pearson correlation test in R. There are no additional package requirements; the correlation function is part of the standard R platform.

If you’re interested in learning more about performing correlations in R, then check out DataCamp’s interactive Correlation and Regression in R online course.

Step 1: Import your data into R

The first step to perform a Pearson correlation in R is that you need some data containing the two variables of interest.

In this example, I will be using the trees dataset in R.

To load the trees dataset, simply run the following code.

#Load the trees dataset
data(trees)

You should now see the tree dataset in the environment.

Step 2: Perform the Pearson correlation test

To perform the Pearson correlation test, use the cor.test function.

By default, the cor.test function performs a two-sided Pearson correlation test.

The cor.test function requires two inputs: x and y. These are the two variables that you want to correlate in the Pearson correlation.

The code to run the Pearson correlation in R is displayed below. Simply replace x and y with the names of the two variables.

#Run the Pearson correlation test
##Replace x and y with the two variables
cor.test(x, y,
         method = "pearson")

By using my example, I am interested in the correlation between the girth and height variables in the trees dataset. So, my code will look like the following.

#Pearson correlation test using the trees dataset
cor.test(trees$Girth, trees$Height,
         method = "pearson")

Additional settings of interest

There are some additional arguments that you can change in the cor.test function. Some of the main ones you may be interested in are defined below.

  • conf.level – change the confidence level (default is 0.95)
    • Numeric value between 0 and 1

For example, if you want to run a Pearson correlation test with a confidence level of 0.90, then enter the following.

#Pearson correlation test with 0.90 confidence level
cor.test(x, y,
         method = "pearson",
         conf.level = 0.90)
  • alternative – change the alternative hypothesis (default is “two.sided”)
    • “two.sided” – non-zero
    • “greater” – greater than zero (ie, positive correlation)
    • “less” – less than zero (ie, negative correlation)

For example, if you wanted to run a one-sided Pearson correlation test with the alternative hypothesis describing a positive association, then enter the following.

#One-sided (positive association) Pearson correlation test
cor.test(x, y,
         method = "pearson",
         alternative = "greater")

Interpretation of results

The output of my example is displayed below.

	Pearson's product-moment correlation
data:  trees$Girth and trees$Height
t = 3.2722, df = 29, p-value = 0.002758
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.2021327 0.7378538
sample estimates:
      cor 
0.5192801 

There are a few parameters returned in the results of the Pearson correlation test. These are summarized below.

  • data – the two variables in the test
  • t – the t-statistic
  • df – the degrees of freedom
  • p-value – the p-value for the Pearson correlation test
  • alternative hypothesis – a description of the alternative hypothesis
  • 95 percent confidence interval – the 95% confidence intervals
  • sample estimates – the Pearson correlation coefficient

So, by looking at my example output, the Pearson correlation coefficient is 0.52.

The Pearson correlation coefficient is a value that ranges from -1 to 1. The major cut-offs are:

  • -1 – a perfectly negative association between the two variables
  • 0 – no association between the two variables
  • 1 – a perfectly positive association between the two variables

Since the coefficient value is positive, this means that there is a positive correlation between the variables girth and height. In other words, as the girth increases, so does the height.

You can also see that the p-value is 0.002758.

Since this p-value is below my alpha level (0.05), I will reject the null hypothesis and accept the alternative hypothesis. In other words, there is a significant (positive) correlation between the girth and height of the cherry trees.

Wrapping up

I have shown you how to perform a Pearson correlation test in R. This can easily be achieved with the cor.test function; no other packages are required.

R version used: 3.6.3
R Studio version used: 1.2.5033

LEAVE A REPLY

Please enter your comment!
Please enter your name here