Testing the income as a moderator for the effect of alcohol usage on suicide rate
The main aim is to test the significance of the average income as a moderator. If the factor under investigation is a moderator, a different behavior in output will be seen for different moderator levels , we use the data set from Gapminder foundation in Stockholm which was gathered from 150 countries in the UN to test for this hypothesis statistically.
so for this case , we will apply and test the average income as a moderator for the effect of alcohol usage on the suicide rate. The alcohol usage and the suicide rate are considered as a quantitative input and output. Thus, we will use Pearson correlation between them.
p.s , there is no need to put the posthoc test since there is no categorical variable.
we will divide the income into 3 categories below 1000 USD, greater than 5000USD and greater than 9000USD
the syntax code used is as the following
# -*- Week4 assignment-*- """ Created on Sun Aug 16 12:57:52 2020
@author: omar.elfarouk """
#%%
# Using gap minder data
# CORRELATION import pandas import numpy import scipy.stats import seaborn import matplotlib.pyplot as plt
data = pandas.read_csv('gapminder.csv', low_memory=False)
data['alcconsumption'] = pandas.to_numeric(data['alcconsumption'], errors='coerce') data['incomeperperson'] = pandas.to_numeric(data['incomeperperson'], errors='coerce') data['suicideper100th'] = pandas.to_numeric(data['suicideper100th'], errors='coerce')
data['incomeperperson']=data['incomeperperson'].replace(' ', numpy.nan)
data['alcconsumption']=data['alcconsumption'].replace(' ', numpy.nan)
data['suicideper100th']=data['suicideper100th'].replace(' ', numpy.nan)
data_clean=data.dropna()
print (scipy.stats.pearsonr(data_clean['alcconsumption'], data_clean['suicideper100th']))
def incomegrp (row): if row['incomeperperson'] <= 1000: return 1 elif row['incomeperperson'] <= 5000 : return 2 elif row['incomeperperson'] > 9000: return 3
data_clean['incomegrp'] = data_clean.apply (lambda row: incomegrp (row),axis=1)
chk1 = data_clean['incomegrp'].value_counts(sort=False, dropna=False) print(chk1)
sub1=data_clean[(data_clean['incomegrp']== 1)] sub2=data_clean[(data_clean['incomegrp']== 2)] sub3=data_clean[(data_clean['incomegrp']== 3)]
print ('association between alcohol consumption and suicide rate for LOW income countries') print (scipy.stats.pearsonr(sub1['alcconsumption'], sub1['suicideper100th'])) print (' ') print ('association between alcohol usage and suicide rate for MIDDLE income countries') print (scipy.stats.pearsonr(sub2['alcconsumption'], sub2['suicideper100th'])) print (' ') print ('association between alcohol usage and suicide rate for HIGH income countries') print (scipy.stats.pearsonr(sub3['alcconsumption'], sub3['suicideper100th'])) #%% scat1 = seaborn.regplot(x="alcconsumption", y="suicideper100th", data=sub1) plt.xlabel('Alchol consumption ') plt.ylabel('Suicide rate') plt.title('Scatterplot for the Association Between Alcohol consumption and Suicide rate for LOW income countries') print (scat1) #%% scat2 = seaborn.regplot(x="alcconsumption", y="suicideper100th", fit_reg=False, data=sub2) plt.xlabel('Alcohol consumption') plt.ylabel('Suicide rate') plt.title('Scatterplot for the Association Between Alcohol consumption and Suicide Rate for MIDDLE income countries') print (scat2) #%% scat3 = seaborn.regplot(x="alcconsumption", y="suicideper100th", data=sub3) plt.xlabel('Alcohol consumption') plt.ylabel('Suicide rate') plt.title('Scatterplot for the Association Between Alcohol consumption and Suicide rate for HIGH income countries') print (scat3)
The results on the graph are displayed as following
The alcohol consumption appears to be a significant factor for the suicide rate only for the middle and high income people.However, the alcohol consumption appears to be insignificant factor on the suicide for the countries with the average income is below 1000USD
Thus , we conclude that with the different significance of alcohol consumption on the suicide rate for different income levels that the Income level is considered as a moderator. as shown form the p values which was below 0.05 for middle and high income value and r correlation values as shown for output values.
association between alcohol consumption and suicide rate for LOW income countries
r values, p value
(0.26070477719347973, 0.059368958483714575)
association between alcohol usage and suicide rate for MIDDLE income countries
r value, p value
(0.3827653707464948, 0.003022644556108631)
association between alcohol usage and suicide rate for various income among countries
r value , p value
(0.5725515263938896, 7.429925022112332e-05)
Credits for Weasly university team in Coursera for gathering the information and the basic knowledge to do the analysis.














