My final Assignment: Correlation of life expectancy and oil consumption versus Income.
by Sebastien Leuba (UK)
Life expectancy and quantities of oil consumed and income relation.
The consumption of oil the last past 10 decades haven’t stop to rise, our income have fallow steadily, like our life expectancy.
Does any real relation exist between is three?
Does Oil Consumption volume and income share a relation?
Does life expectancy increase as oil quantities rise?
Does life expectancy is affected by income?
Data & method
Data
I will use for this hypothesis the Gap Minder project, data are available here data set , the fallowing codebook will help you to understand and read the data.
Methodology
I will use the fallowing data categories:
Incomeperperson
Lifeexpectancy
Oilperperson
“Lifeexpectancy” data category has been re-categorized under the fallowing label:
“50”; “60”;”70”;”80”;”90”
“Oilperperson” data category has been re-categorized by slide of income of “2500” (only for the first) “5000” with an incremental of 5000 after that.
UNIVARIATE
UNIVARIATE GRAPHIC
BIVARIATE
Correlation
After running a correlation, we can notice R value close near '1"( green frame on the above picture)
The P value (red frame) is <0.0001 for 2 of the subset of data
The correlation between oil and life expectancy seen to be weaker than expected, what isn't surprising if you give an other look at the scatter plot above.
Chi Square
A lack of dense data set, make the chi square no usable for my experiment , but it still returning a good P ratio <0.0001 and definitely in-ply a relation between Life expectancy and Income.
Result
Does any real relation exist between is three (oil , Income, life expectancy)?
We can see in our Correlation model , the P value for each is about :<0.0001 ; <0.0001 ; 0.0038.
The R value for each of our examined variable are over 0 (0.60152; 0.54189;0.36267).
We can clearly see a sightly weaken correlation between oil and life expectancy. We will come back on this matter on later study
Does Oil Consumption volume and income share a relation?
We can clear see on the graph (bivariate series) a clear rise of the oil consumption as the income per person rise. We can see on the correlation table (annotation in orange ) a P value <0.0001 , The R value is 0.54189.
Does life expectancy increase as oil quantities rise
The drawing of a scatter plot show an existing relation but does appear as strong as expected , further the correlation process show a "weak but real correlation" between the 2 variables.
A Chi square test show a complete different picture where missing data
A Pvalue 0.4174 , Overall the hypothesis wont be retained by a lack of data.
Does life expectancy is affected by income?
The drawing of a bivariable graph show like , the Pvalue of correlation is significant (P=<0.0001) and the R value is 0.6152 .
We can accept the hypothesis that Income affect life expectancy.
Overview
The hypothesis that oil consumption and higher income will make you live longer is significantly can't be verified, why?
The idea behind oil making you living longer , sound funny some how,(when we know the amount of pollution generated & ecological issues ) but behind is more you get an higher income you can afford more petrol or your country is more industrialised , more modern so better public healthcare.
The real problem here is to get a set of data fully populated, sadly the Gap Minder project and the choice of data made din't help.
The hypothesis will have been certainly more significant if rich data were available at the time of the study but it will be for an other time.
How far from the end , I could be , to sum the way to :
Assignment 6
Model for presenting results for ANOVA in your blog:
When examining the association between current number of cigarettes smoked (quantitative response variable) and past year nicotine dependence (categorical explanatory variable), an Analysis of Variance (ANOVA) revealed that among young adult smokers (my sample), those with nicotine dependence reported smoking significantly more cigarettes per day (Mean=14.6, s.d. ±9.15) compared to those without nicotine dependence (Mean=11.4, s.d. ±7.43), F(1, 1313)=44.68, p=0001.
Assignment 7
Model for presenting results for Chi-Square tests in your blog:
When examining the association between lifetime major depression (categorical response variable) and past year nicotine dependence (categorical explanatory variable), a chi-square test of independence revealed that among young adults smokers (my sample), those with past year nicotine dependence were more likely to have experienced major depression in their lifetime (36.2%) compared to those without past year nicotine dependence (12.7%), X2 =88.60, 1 df, p=0001.
Assignment 8
Model for presenting results for Correlation in your blog:
Among young adult smokers (my sample), the correlation between number of cigarettes smoked per day (quantitative explanatory variable) and number of nicotine dependence symptoms experienced in the past year (quantitative response variable) was 0.17 (p=.0001), suggesting that only 3% (i.e. 0.17 squared) of the variance in number of current nicotine dependence symptoms can be explained by number of cigarettes smoked per day.
Assignment 9
Evaluate a potential third variable moderator in the context of ANOVA, Chi Square or Correlation and interpret the results. Make sure to comment you code and include your interpretation in a blog post.
IMPORTANT: In order for the automatic grader to process your assignment, you will need to submit your Log as a text file (.txt). Either copy or paste it into a Word Processor, or select Send To:Microsoft Word. Save the resulting document as a text file with the extension .txt, and Submit it via the Assignment Page.
I used the assignment 5 to put at test directly my hypothesis and to get a visual of the relation via scatter plot , the fallowing scatter show a poor relation between Oil consumption and life expectancy.
Futher investigation will be needed to develop the current hypothesis
When everything fail, nothing better than a old fashion .CSV file
After multiple attempt with "LIBNAME My data" and a log full of error and spending over hours reading post on coursera web site nothing was coming out of it.
I decided to import manually my data with an old fashioned CSV (commas separated value) file.
The Importation run pretty smoothly much like I was doing a Excel Import file, always a step a the time!