In this tutorial, I will show you how to perform a Pearson correlation test in R.

## What is a Pearson correlation test?

A Pearson correlation test is a parametric, statistical test to determine the linear correlation between two variables.

## Example data

For this tutorial, I will use the trees dataset that is already available within R.

The trees dataset contains measurements from 31 cherry trees. Specifically, the data frame contains three variables:

**Girth**– the tree diameter (in inches)**Height**– the tree height (in feet)**Volume**– the volume of timber (in cubic feet)

In this example, I am interested in the correlation between tree girth and height.

## Example hypothesis

Based on the above example data, here are my two hypotheses:

**Null hypothesis**– There is no significant correlation between tree girth and height**Alternative hypothesis**– There is a significant correlation between tree girth and height

I will also set my alpha level to 0.05.

## How to perform a Pearson correlation test in R

It is really easy to perform a Pearson correlation test in R. There are no additional package requirements; the correlation function is part of the standard R platform.

If you’re interested in learning more about performing correlations in R, then check out DataCamp’s interactive Correlation and Regression in R online course.

### Step 1: Import your data into R

The first step to perform a Pearson correlation in R is that you need some data containing the two variables of interest.

In this example, I will be using the trees dataset in R.

To load the trees dataset, simply run the following code.

#Load the trees dataset data(trees)

You should now see the tree dataset in the environment.

### Step 2: Perform the Pearson correlation test

To perform the Pearson correlation test, use the cor.test function.

By default, the cor.test function performs a two-sided Pearson correlation test.

The cor.test function requires two inputs: x and y. These are the two variables that you want to correlate in the Pearson correlation.

The code to run the Pearson correlation in R is displayed below. Simply replace x and y with the names of the two variables.

#Run the Pearson correlation test ##Replace x and y with the two variables cor.test(x, y, method = "pearson")

By using my example, I am interested in the correlation between the girth and height variables in the trees dataset. So, my code will look like the following.

#Pearson correlation test using the trees dataset cor.test(trees$Girth, trees$Height, method = "pearson")

### Additional settings of interest

There are some additional arguments that you can change in the cor.test function. Some of the main ones you may be interested in are defined below.

**conf.level**– change the confidence level (default is 0.95)- Numeric value between 0 and 1

For example, if you want to run a Pearson correlation test with a confidence level of 0.90, then enter the following.

#Pearson correlation test with 0.90 confidence level cor.test(x, y, method = "pearson", conf.level = 0.90)

**alternative**– change the alternative hypothesis (default is “two.sided”)- “two.sided” – non-zero
- “greater” – greater than zero (ie, positive correlation)
- “less” – less than zero (ie, negative correlation)

For example, if you wanted to run a one-sided Pearson correlation test with the alternative hypothesis describing a positive association, then enter the following.

#One-sided (positive association) Pearson correlation test cor.test(x, y, method = "pearson", alternative = "greater")

## Interpretation of results

The output of my example is displayed below.

Pearson's product-moment correlation data: trees$Girth and trees$Height t = 3.2722, df = 29, p-value = 0.002758 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.2021327 0.7378538 sample estimates: cor 0.5192801

There are a few parameters returned in the results of the Pearson correlation test. These are summarized below.

**data**– the two variables in the test**t**– the t-statistic**df**– the degrees of freedom**p-value**– the p-value for the Pearson correlation test**alternative hypothesis**– a description of the alternative hypothesis**95 percent confidence interval**– the 95% confidence intervals**sample estimates**– the Pearson correlation coefficient

So, by looking at my example output, the Pearson correlation coefficient is 0.52.

The Pearson correlation coefficient is a value that ranges from -1 to 1. The major cut-offs are:

**-1**– a perfectly negative association between the two variables**0**– no association between the two variables**1**– a perfectly positive association between the two variables

Since the coefficient value is positive, this means that there is a positive correlation between the variables girth and height. In other words, as the girth increases, so does the height.

You can also see that the p-value is 0.002758.

Since this p-value is below my alpha level (0.05), I will reject the null hypothesis and accept the alternative hypothesis. In other words, there is a significant (positive) correlation between the girth and height of the cherry trees.

## Wrapping up

I have shown you how to perform a Pearson correlation test in R. This can easily be achieved with the cor.test function; no other packages are required.

R version used: 3.6.3

R Studio version used: 1.2.5033