Data Analysis with Pearson Correlation: Life expectancy Against Income per Person Using SAS
Here I am using the GapMinder dataset to examine the relationship between income per person and life expectancy using Pearson Correlation. In particular, I am interested to statistically establish the answer the below question:
Does life expectancy have a linear relationship with income per person?
Here, income per person is my independent predictor and life expectancy is considered the dependent variable.
Pearson correlation coefficient (r) can have values ranging from -1 to +1 where
· -1 denotes a perfect linear negative relationship
· 0 denotes no linear relationship at all
· +1 denotes a perfect linear positive relationship
As a result, if r is +1 or -1 we can use the independent predictor variable to perfectly predict the values of the dependent variable. For any other values of r, we can calculate r2 (called coefficient of determination) to understand up to what percentage we can predict the variability of the dependent variable using the independent predictor.
Here is the code snippet in SAS:
Program Output
Observations and interpretations
1. From the output of PROC CORR, the correlation coefficient has a value of approximately +0.6 which means the two variables have a fair correlation and life expectancy increases as income per person rises.
2. The r2 value is 0.6 * 0.6 = 0.36 indicating that we can predict the variability in life expectancy by observing changes in income per person in only 36% cases.
3. The output of PROC SGPLOT, the scatter plot, shows the reason for r2 being so low. The scatter plot actually shows a non-linear positive relationship between the two variables. For the first part it shows steep linear positive relationship which then flattens out after a threshold. So, life expectancy does increase with increase in income per person, but the relationship is not strictly linear.














