Partial correlations are great in that you can perform a correlation between two continuous variables whilst controlling for various confounders. However, the partial correlation option in SPSS is defaulted to performing a Pearson’s partial correlation which assumes normality of the two variables of interest.
But what if you want to perform a Spearman’s partial correlation on non-normally distributed data?
If you go to Analyze > Correlate > Partial …
you will see that there is no option to select a Spearman correlation. There is, however, a way around this using a little coding.
In this guide, I will explain how to perform a non-parametric, partial correlation in SPSS.
The required dataset
To be able to conduct a Spearman partial correlation in SPSS, you need a dataset, of course. For our example, we have the age and weight of 20 volunteers, as well as gender. What we want to test is if there is a correlation between age and weight, after controlling for gender.
Creating the script
For this to work, you need to enter a small piece of script into the SPSS Syntax Editor. Open up the Syntax Editor by going to File > New > Syntax
.
Next, copy and paste the following code:
NONPAR CORR /MISSING = LISTWISE /MATRIX OUT(*). RECODE rowtype_ ('RHO'='CORR') . PARTIAL CORR /significance = twotail /MISSING = LISTWISE /MATRIX IN(*).
You now need to add the appropriate variables next to the NONPAR CORR
and PARTIAL CORR
sections.
So, next to the NONPAR CORR
enter all of the variables that will be involved in the partial correlation. In our example, this would be Age, Weight and Gender.
For the PARTIAL CORR
line you need to enter the two variables of interest in the correlation followed by a BY
then the variables you want to control for. Make sure all of the variables you enter match the ones in your file correctly, otherwise the script will fail.
Here is what our example will look like:
NONPAR CORR Age Weight Gender /MISSING = LISTWISE /MATRIX OUT(*). RECODE rowtype_ ('RHO'='CORR') . PARTIAL CORR Age Weight BY Gender /significance = twotail /MISSING = LISTWISE /MATRIX IN(*).
And here is what it looks like in the Syntax Editor:
Running the script
The script itself is separated into 3 parts: NONPAR CORR
, RECODE
and PARTIAL CORR
.
The first is to perform a Spearman bivariate correlation for all variables and to add the Spearman rank correlation coefficients into a new file.
RECODE
converts the row type from a Spearman (RHO) to a Pearson (CORR).
Finally, PARTIAL CORR
performs the partial correlation on the desired variables by using the newly created Spearman correlation coefficients from the NONPAR CORR
script.
Here is how to run the script:
1. To run the script, go to the Syntax Editor and with the NONPAR CORR
section selected, hit the green play button.
This will give you an output for the Spearman’s rho between the variables. If you go to the SPSS Output file you will see:
You will also notice that a new SPSS data file has been created and is now open. This is usually named ‘Untitled’, or something similar. Within this file, you will see the Spearman’s rho values and n numbers for each correlation.
2. You next need to go back to the Syntax Editor window and run the RECODE
part of the script. Make sure you select the new dataset as the active worksheet for this, as you want to perform the RECODE
on the new sheet. You can toggle between datasets by clicking on the drop-down menu next to Active:
. In this case, we select the Unnamed
sheet:
Click the green play button again to run the RECODE
script on this.
3. Finally, still in the Syntax window, select the PARTIAL CORR
code and run this on the same Unnamed
dataset. This will perform the final partial correlation.
The output
By looking in the output file, you should now see a Partial Corr
box which contains the partial correlation coefficients and P values for the test:
You will see in this example that the non-parametric partial correlation for age with weight, after controlling for gender, has a coefficient value of ‘0.383’ and has a significant value of ‘0.105’.
Therefore, there is not a significant correlation for age and weight after accounting for gender.
Hi Steven,
Thank you so much for this guide. One quick question: when I run it, my degrees of freedom are off, which impacts my p values. I think my degrees of freedom are being based on the variables in the correlation matrix, not the actual number of cases. For example, although I have a sample size of 100, my df when I run the syntax ends up being 26. Although I am controlling for 6 variables and am not sure exactly what the df should be, 26 doesn’t seem right to me. From your screenshots, it doesn’t seem like you had this problem. Do you have any thoughts as to why this would happen? Thanks so much again!
Emma
Hi Emma,
So sorry for the delay.
So do you have 100 cases and all of these have matching data for the variables being controlled for, ie there are no missing data points?
Thanks,
Steven
Dear Steven, I just wonder how to cite this method?
Cheers
Juan
Hi Juan,
I based this guide on the one produced by IBM. In that, they quote a reference:
Conover, W.J. (1999), “Practical Nonparametric Statistics (3rd Ed.). New York: Wiley, (p. 327-328).
This may be a good place to start?
I hope that helps,
Best wishes,
Steven
Hi Steven
I’m using the SAS instead.
Obtained from: https://en.wikipedia.org/wiki/Partial_regression_plot
1) Computing the residuals of regressing the response variable against the independent variables but
omitting Xi
2) Computing the residuals from regressing Xi against the remaining independent variables
3) Plotting the residuals from (1) against the residuals from (2).
Example of SAS code: (I wish to acknowledge the contribution of Mr. Lin (Robbin@TMU), for his assistance in the longitudinal stats class)
—
SAS code:
proc import datafile=”C:\Users\User\Desktop\working.xls” out=ddd replace dbms=excel;
run;
proc print;run;quit;
*1. Computing the residuals of regressing the response variable against the independent variables but omitting Xi;
proc reg data=ddd;
model var1=ctrl1 ctrl2;
output out=out1 residual=r1;
run;
proc print data=out1;run;quit;
*2.Computing the residuals from regressing Xi against the remaining independent variables;
proc reg data=ddd;
model var2=crtl1 ctrl2;
output out=out2 residual=r2;
run;
proc print data=out2;run;quit;
*3.Plotting the residuals from (1) against the residuals from (2).;
data out1 ;set out1; _n+1;run;
data out2;set out2;_n+1;run;
data out3;merge out1 out2; by _n;run;
proc sgplot data=out3;
scatter x=r1 y=r2;
run;
—
Cheers
Larry
Hi Steven
This is the SPSS syntax for the non-parametric partial corr the syntax example from SPSS forum (https://developer.ibm.com/answers/questions/223269/plotting-a-partial-corr-using-pairwise-exclusion/).
The SPSS syntax as follows:
—
* Encoding: UTF-8.
NONPAR CORR var1 var2 ctrlvar1 ctrlvar2
/MISSING = LISTWISE
/MATRIX OUT(*).
RECODE rowtype_ (‘RHO’=’CORR’) .
PARTIAL CORR var1 var2 BY ctrlvar1 ctrlvar2
/significance = twotail
/MISSING = LISTWISE
/MATRIX IN(*).
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT var1
/METHOD=ENTER ctrlvar1 ctrlvar2
/SAVE ZRESID.
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT var2
/METHOD=ENTER ctrlvar1 ctrlvar2
/SAVE ZRESID.
GRAPH
/SCATTERPLOT(BIVAR)=RES_1 WITH RES_2
/MISSING=LISTWISE.
—
Please feel free to comment on this syntax. Much obliged.
Best wishes
Larry Lai
Hi Larry,
Thanks for sharing. Did this work for you? The syntax looks like it is doing a regression, similar to how I described, and plotting the residuals this way.
Best wishes,
Steven
Hi Steven
Thanks for sharing. By the way, how to plot a Non-Parametric Partial Correlation In SPSS?
Thanks for considering my request.
Cheers
Larry
Hi Larry,
Thanks for your comment!
Plotting the results is something I have found quite difficult myself. But I am yet to find a conclusive answer.
I have seen others which plot the results via a regression:
What you can do in SPSS is plot these through a linear regression. Go to: Analyze -> Regression -> Linear Regression Put one of the variables of interest in the Dependent window and the other in the block below, along with any covariates you wish to control for. Then click the Plots button and tick the option for ‘Produce all partial plots’. Then run the test. One of the graphs produced will be the graph you are after. Hope that helps!
Whether this is the correct way, however, I am not so sure – sorry.
If you do find out, please come back and share!
Best wishes,
Steven
Hi Steven,
Thank you for this useful guide!
I worked with this syntax but I get this warning:
“The MATRIX subcommand on the PARTIAL CORR command specifies an input file which does not contain a correlation matrix for the current splitfile group. Within cell matrices are not acceptable. A correlation matrix has a row type of “CORR”.”
can you help me out ?
regards,
Ferehsteh
Unfortunately, one can not meaningfully apply the partial correlation formulas from parametric (usually Pearson’s) correlation to Spearman’s Rank correlation. You can apply the formulas as you have above, but the formulas were not developed for Spearman’s and the answers you get back are not meaningful partial correlations as they are with Pearson’s, so the Spearman’s the partial correlations are meaningless and can not be interpreted. This might not stop people doing it, but their resulting conclusions are fatally flawed.
However, you can use Kendall’s Tau correlation for nonparametric correlation, and apply the same parametric partial correlation formula to get meaningful answers. Be aware though that Kendall’s Tau has a different meaning to Pearson’s r in explaining the correlation relationship. Unfortunately, there’s no easy way to apply significance testing to partial correlations based upon Kendall’s Tau since the underlying sample distribution is not defined (as it is for Pearson’s).
So if you want partial correlations for nonparametric data, use Kendall’s Tau rather than Sprearman’s r.
Hello! Thanks for this wonderful guide. Do you have any suggestions for how to plot the results of the nonparametric partial correlation on a graph? I cannot figure this out or find anything online.
Super helpful! Can you please tell me how to flag significant correlations on the output?
Thank you so much for a very helpful post and also helpful comments and replys above.
Is it possible to enter more than two variables at the same time (before BY)? Or do I have to repeat it for every depending variable I want to test?
Hello Ingrid,
Many thanks for your comments and kind words. I presume you can enter more than 2 variables before the ‘BY’. The results should then display a grid table so you can look at all your correlations within the same output.
Let me know if there is an issue with this however.
Thanks!
Steven
Thank you for the reply. It worked and gave a grid table as you said. If you want to test the correlation between one (undepenent) variable and all the others (depentend) varaibles, you can place that one first or last and write WITH between in the partial recode.
Example: PARTIAL CORR Age WITH Weight Pain BY Gender
Then the output will show only Age as a horisontal collum and Weight and Pain in the vertical collums.
Excellent, glad it worked for you. And thank you very much for the additional tip 🙂 greatly appreciated
can i know why the partial coefficient value is higher than the spearman’s rho value? shouldn’t it be lower?
Hello,
The correlations can increase or decrease depending upon the relationship your covariates have on the variables you are interested in. There is a discussion on this on ResearchGate which may be useful to see:
https://www.researchgate.net/post/Any_advice_on_partial_correlation_interpretation
Hope that helps!
Thanks
Steven
Hi Steven,
I am running your script, but having an error below. The problem is the new “unnamed” or “unknown” data sheet does not exist or I can’t find it. What to do?
Thanks,
Terhi
DATASET ACTIVATE DataSet1.
RECODE ROWTYPE_ (‘RHO’=’CORR’).
Error # 4631 in column 8. Text: ROWTYPE_
On the RECODE command, the list of variables to be recoded includes the name
of a nonexistent variable.
Execution of this command stops.
Hello Terhi,
So sorry for the late reply. Did you manage to sort this? It seems like the new results are not being opened in a new datasheet. Have you ensured the ‘/MATRIX OUT(*).’ part of the code is included before you run the RECODE part of the code.
Thanks
Steven
Hi Steve,
Great post. Do you know how to compute 95% confidence intervals for Spearman’s partial correlations using the syntax? My reviewers are requesting confidence intervals for all point-estimates in accordance with APA.
Thanks,
Rose
Hi Rose,
Thanks for the comment. Unfortunately I do not know how to report 95% CI for this. Upon reading around this it seems quite a few people are asking the same thing. I have found a link to this website however, http://vassarstats.net/rho.html, which computes 95% CI from the r and n values. May be of use for you?
Thanks,
Steven
Hi Rose:
This reply may have come a little late for you. As I posted below, there is no such thing as partial correlations for Spearman’s rho. Therefore, compute Kendall’s Tau, where you can calculate meaningful partial correlations. However, even for Kendall’s Tau, there is no defined sampling distribution, and so CIs can not be calculated. You can move forward in a few ways: First carefully review your data to be sure that Pearson’s r can not be used. Pearson’s r is pretty robust, and unless your data are very skewed from normal, you might be able to proceed (don’t get distracted by the type of data you collecting, you can apply Pearson’s r even to categorical data). If the first approach does not work, try a data transformation to make your data sufficiently normal to apply Pearson’s r (Spearman’s itself is a kind of rank data transformation). Finally, if you end up using Kendall’s Tau you might be able to apply bootstrap methods to develop a sampling distribution to create CIs around the partial correlations. This is a last resort for most people, and I’ve rarely seen this done.
Ian
Thanks for the advice Ian, really appreciate it!
Best wishes,
Steven
Hi, Steven
Thank you so much for providing this. I read the IBM instructions for syntax and was totally bamboozled; your tutorial and example was very easy to follow and was immensely helpful!
FYI, I am using SPSS V24 on a Mac. When I ran the second part of the syntax [RECODE rowtype_ (‘RHO’=’CORR’) .] I received a warning message. I removed the space between “rowtype_” and “(‘rho’=’corr’) and re-ran without any further problems [ie. RECODE rowtype_(‘RHO’=’CORR’) .].
regards
Marie
Hi Marie,
Thanks very much for the feedback, very much appreciated. Also, thanks for providing details for the Mac users. Unfortunately I am just on Windows at the minute so I cannot provide too much information on that system, but maybe in the future I can expand :).
Best wishes,
Steven
Hi,
I’m on a Mac also and I found the warning message disappeared if I ensured I had clicked at the top of the syntax (so that the procedure was run from the right place and not halfway down the command).
Hi Rose,
Thanks very much for this, I was having exactly the same problems as you (on a Mac) and found that clicking the top of the syntax sorted this. Your info sharing and advice has made a happy student!
Hi Steven,
This is really helpul, but can you control for more than one variable? E.g., 3 variables (1 continuous and 2 categorical).
Hi Omar,
Thanks for the feedback. Yes, you can control for more than one variable. However, the more variables you are controlling for the less reliable the test may become because you may over-fit your analysis. If you have a large enough sample size then it should be okay. One rule, called the One in Ten rule (https://en.wikipedia.org/wiki/One_in_ten_rule), is suggested for regression analysis and could be kept in mind when doing a partial correlation. Briefly, for every control (or predictor) variable you use there must be at least 10 samples in the analysis.
Hope that helps!
Great! So, in this case I would need to do something like Age Weight BY Gender By SES BY Ethnicity, right?
When more than one control variable is entered then only one ‘BY’ is required. So:
Age Weight BY Gender SES Ethnicity
This will control for ‘Gender’, ‘SES’, and ‘Ethnicity’.
Hope that helps 🙂
Also, are you sure partial correlations can be run by categorical variable? I thought it was only to control for a continuous variable.
As far as I am aware, you can control for dichotomous variables (e.g. gender). However, I am no stats expert!
Hey Steven,
I am trying to run your syntax, but my output says:
“The input matrix file does not contain a ROWTYPE_ variable or the variable has been misspecified.”
Could you help me out?
Thank you!
Hi Jess,
Sorry for the late response. The error you are getting, when do you get this? Is this for the first (nonpar corr), second (recode) or third (partial corr) part of the script?
Thanks!
Hi Steven,
I am also experiencing this error. I get it at the second [RECODE rowtype_ (‘RHO’=’CORR’)] part of the script.
The exact error message is as Jess stated, “The input matrix file does not contain a ROWTYPE_ variable or the variable has been misspecified.”
Thanks in advance
Hi Lauren,
I think this error is because you may be running the RECODE part of the script using original datasheet. Have you changed the ‘Active’ sheet to the newly created ‘unnamed’ one before running the RECODE part? (See point 2 in the guide above).
I am in the process of creating a screencast video that will hopefully help.
Let me know if this works 🙂
Thanks
This was exactly what I needed, thank you so much! I agree with Rachael that is really clearly described.
Thank you very much Suzanne for the comment. I am glad it helped you out too 🙂
This has been so helpful. Thank you. Really clear and easy to follow.
Thanks Rachael, I really appreciate your comment. I am glad it helped you out 🙂