You can use a bivariate Pearson Correlation to test whether there is a statistically significant linear relationship between height and weight, and to determine the strength and direction of the association. Before we look at the Pearson correlations, we should look at the scatterplots of our variables to get an idea of what to expect. In particular, we need to determine if it's reasonable to assume that our variables have linear relationships.
When finished, click OK. To add a linear fit like the one depicted, double-click on the plot in the Output Viewer to open the Chart Editor. Notice that adding the linear regression trend line will also add the R-squared value in the margin of the plot.
If we take the square root of this number, it should match the value of the Pearson correlation we obtain. From the scatterplot, we can see that as height increases, weight also tends to increase. There does appear to be some linear relationship. Select the variables Height and Weight and move them to the Variables box. In the Correlation Coefficients area, select Pearson. In the Test of Significance area, select your desired significance test, two-tailed or one-tailed.
We will select a two-tailed significance test in this example. Check the box next to Flag significant correlations. Click OK to run the bivariate Pearson Correlation. Output for the analysis will display in the Output Viewer.
The important cells we want to look at are either B or C. Cells B and C are identical, because they include information about the same pair of variables.
Cells B and C contain the correlation coefficient for the correlation between height and weight, its p-value, and the number of complete pairwise observations that the calculation was based on.
The correlations in the main diagonal cells A and D are all equal to 1. This is because a variable is always perfectly correlated with itself. This is because of missing data -- there are more missing observations for variable Weight than there are for variable Height. If you have opted to flag significant correlations, SPSS will mark a 0.
In cell B repeated in cell C , we can see that the Pearson correlation coefficient for height and weight is. Search this Guide Search. SPSS Tutorials: Pearson Correlation The bivariate Pearson Correlation measures the strength and direction of linear relationships between pairs of continuous variables. Pearson Correlation The bivariate Pearson Correlation produces a sample correlation coefficient, r , which measures the strength and direction of linear relationships between pairs of continuous variables.
Common Uses The bivariate Pearson Correlation is commonly used to measure the following: Correlations among pairs of variables Correlations within and between sets of variables The bivariate Pearson correlation indicates the following: Whether a statistically significant linear relationship exists between two continuous variables The strength of a linear relationship i. Data Requirements To use Pearson correlation, your data must meet the following requirements: Two or more continuous variables i.
This means that: the values for all variables across cases are unrelated for any case, the value for any variable cannot influence the value of any variable for other cases no case can influence another case on any variable The biviariate Pearson correlation coefficient and corresponding significance test are not robust when independence is violated.
Bivariate normality Each pair of variables is bivariately normally distributed Each pair of variables is bivariately normally distributed at all levels of the other variable s This assumption ensures that the variables are linearly related; violations of this assumption may indicate that non-linear relationships among variables exist. Linearity can be assessed visually using a scatterplot of the data. Random sample of data from the population No outliers. Data Set-Up Your dataset should include two or more continuous numeric variables, each defined as scale, which will be used in the analysis.
Example: Understanding the linear association between weight and height Problem Statement Perhaps you would like to test whether there is a statistically significant linear relationship between two continuous variables, weight and height and by extension, infer whether the association is significant in the population.
Output Tables The results will display the correlations in a table, labeled Correlations. The direction of the relationship is positive i. The magnitude, or strength, of the association is approximately moderate. Report a problem. Subjects: Statistical Software. Tags: statistics , tutorials. Note: The independence of cases assumption is also known as the independence of observations assumption.
Since assumptions 1, 2 and 3 relate to your study design and how you measured your variables , if any of these three assumptions are not met i. After checking if your study design and variables meet assumptions 1, 2 and 3 , you should now check if your data also meets assumptions 4, 5, 6 and 7 below.
When checking if your data meets these four assumptions, do not be surprised if this process takes up the majority of the time you dedicate to carrying out your analysis. As we mentioned above, it is not uncommon for one or more of these assumptions to be violated i. However, with the right guidance this does not need to be a difficult process and there are often other statistical analysis techniques that you can carry out that will allow you to continue with your analysis.
Note: If your two continuous, paired variables i. Unfortunately, the assumption of bivariate normality is very difficult to test, which is why we focus on linearity and univariate normality instead. Homoscedasticity is also difficult to test, but we include this so that you know why it is important. We include outliers at the end i. Note: Pearson's correlation coefficient is a measure of the strength of a linear association between two variables.
Put another way, it determines whether there is a linear component of association between two continuous variables. As such, linearity is not strictly an "assumption" of Pearson's correlation.
However, you would not normally want to use Pearson's correlation to determine the strength and direction of a linear relationship when you already know the relationship between your two variables is not linear.
Instead, the relationship between your two variables might be better described by another statistical measure Cohen, For this reason, it is not uncommon to view the relationship between your two variables in a scatterplot to see if running a Pearson's correlation is the best choice as a measure of association or whether another measure would be better.
For further reading on this issue, see, for example, Edgell and Noon and Hogg and Craig Note: Outliers are not necessarily "bad", but due to the effect they have on the Pearson correlation coefficient, r , discussed on the next page , they need to be taken into account. You can check whether your data meets assumptions 4, 5 and 7 using a number of statistics packages to learn more, see our guides for: SPSS Statistics , Stata and Minitab. If any of these seven assumptions are violated i.
On the next page we discuss other characteristics of Pearson's correlation that you should consider. Pearson Product-Moment Correlation What does this test do? What values can the Pearson correlation coefficient take? This is shown in the diagram below: How can we determine the strength of association based on the Pearson correlation coefficient?
Join the 10,s of students, academics and professionals who rely on Laerd Statistics. Are there guidelines to interpreting Pearson's correlation coefficient? Can you use any type of variable for Pearson's correlation coefficient? Do the two variables have to be measured in the same units?
What about dependent and independent variables? This is illustrated below: Does the Pearson correlation coefficient indicate the slope of the line?
What assumptions does Pearson's correlation make? Assumption 1: Your two variables should be measured on a continuous scale i.
Assumption 2: Your two continuous variables should be paired , which means that each case e. These "values" are also referred to as "data points".
For example, imagine that you had collected the revision times measured in hours and exam results measured from 0 to from randomly sampled students at a university i.
Each of the students would have a value for revision time e. Therefore, you would have paired values.
0コメント