Testing a Potential Moderator
by Fausto Keske
Introduction
This is the fourth and final assignment (Week 4) of the course Data Analysis Tools from the Wesleyan University at the Coursera Platform. Now, the challenge is to test a potential moderator. So, we are questioning whether there is an association between two constructs for different subgroups within the sample.
Case Study – Gap Minder
Gap Minder was founded in Stockholm by Ola, Anna and Hans Rosling. The company is a non-profit venture promoting sustainable global development and the achievement of the United Nations Millenium Development Goals. It seeks to increase the use and understanding of statistics about social, economic, and environmental development at local, national and global levels. Its website is mind blowing, everybody should visit it: https://www.gapminder.org/.
The dataset provided for this assignment has 16 variables and 213 observations. From the variables, I choose to analyze income per person (incomeperperson) and life expectancy (lifeexpectancy). The moderator is urban rate (urbanrate). The income per person is measured by the Gross Domestic Product per capita in constant 2.000 US$ (2010). And the life expectancy at birth (years) is the average number of years a newborn child would live if current mortality patterns were to stay the same. And the urban rate is % of total population that lives in urban areas (2008).
The Question
Are income per person and life expectancy associated for low urban rate countries? And income per person and life expectancy associated for high urban rate countries? In other words, our explanatory variable associated with our response variable, for each population sub-group? We are using a third variable to understand if this variable effects the direction and or strength of the relation between our explanatory and response variable.
To understand the two variables, I analyze the boxplots posted below.
Since urban rate is a quantitative variable, I had to transform it into a categorical variable. I created a sub-group for low urban rate countries and another one to a high urban rate countries.
The SAS Studio Code
For this assignment, I decided to code on SAS Studio, since the tool was new for me, and it was an opportunity to gain experience. The code is posted below. The comments are in Portuguese, my mother tongue.
Results – Pearson Correlation Coefficient (r) for both groups
Aiming to answer the question of the assignment, I ran a correlation procedure to obtain the Pearson Correlation Coefficient (r) for both groups. The results of the test showed a P value 0.0001. So, the test is valid, and we can analyze the results of the two sub-groups.
Since the r obtained is positive, we have a positive correlation between income per person and life expectancy for both groups. The correlation is stronger for the second group, the countries with higher urban rate. The r in this group is 0.63784. For the first group, the r obtained is 0.47392. These conclusions can also be seen in two the scatter plot posted below. We can see too much dispersion, when we display the points of the countries at the graph.
The r2 for the first sub-group (low urban rate) is 0.2246. The r2 is the fraction of the variability of one variable that can be predicted by the other. So, if we know the income per person, we can predict only 22.46% of the variability we will see in the rate of life expectancy.
For the second sub-group (high urban rate), the r2 is 0,4068. If we know the income per person, we can predict only 40.68% of the variability we will see in the rate of life expectancy, almost the double of the first sub-group.















