Untitled @youngtigercheesecake - Tumblr Blog

Week 4

Creating graphs for your data code

import pandas import numpy import pandas as pd import seaborn import matplotlib.pyplot as plt

data = pd.read_csv('gapminder_pds.csv', low_memory=False)

bindata = data.copy()

convert variables to numeric format using convert_objects function

data['internetuserate'] = pd.to_numeric(data['internetuserate'], errors='coerce') bindata['internetuserate'] = pd.cut(data.internetuserate, 10)

data['incomeperperson'] = pd.to_numeric(data['incomeperperson'], errors='coerce') bindata['incomeperperson'] = pd.cut(data.incomeperperson, 10)

data['employrate'] = pd.to_numeric(data['employrate'], errors='coerce') bindata['employrate'] = pd.cut(data.employrate, 10)

data['femaleemployrate'] = pd.to_numeric(data['femaleemployrate'], errors='coerce') bindata['femaleemployrate'] = pd.cut(data.femaleemployrate, 10)

data['polityscore'] = pd.to_numeric(data['polityscore'], errors='coerce') bindata['polityscore'] = data['polityscore'] sub2 = bindata.copy()

Scatterplot for the Association Between Employment rate and lifeexpectancy

scat1 = seaborn.regplot(x="internetuserate", y="incomeperperson", fit_reg=False, data=data) plt.xlabel('Internet use rate') plt.ylabel('Income per person') plt.title('Scatterplot for the Association Between Internet use rate and Income per person')

This scatterplot show the relationship and seems to be exponential.

Univariate histogram for quantitative variable:

seaborn.distplot(data["incomeperperson"].dropna(), kde=False); plt.xlabel('Income per person')

The graph is highly right skewed. Incomes are small for most of the world and the wealthy tail is quite long.

Univariate histogram for quantitative variable:

seaborn.distplot(data["employrate"].dropna(), kde=False); plt.xlabel('Employ rate')

Summary

It looks like there are associations between Internet use rate and income per person going up with internet use rate and going up an an accelerating rate.

#Visualized Data #Creating graphs #Assignment #Coursera #Data Management and Visualization

Week 3:

Assignment : Making Data Management Decisions

import pandas import numpy

data = pandas.read_csv('addhealth_pds.csv', low_memory=False)

Select only specific columns that will be used

Data_new = data[["BIO_SEX","H1SU2","H1SU2","H1TO15"]]

setting variables that we will be working with to numeric (updated)

data['BIO_SEX'] = pandas.to_numeric(data['BIO_SEX']) data['H1SU1'] = pandas.to_numeric(data['H1SU1']) data['H1SU2'] = pandas.to_numeric(data['H1SU2']) data['H1TO15'] = pandas.to_numeric(data['H1TO15'])

subset data to only male how many attempting commiting suicide have in the last 12 months

sub1=data[(data['BIO_SEX']==1) & (data['H1SU2']!=0)]

make a copy of my new subsetted data

sub2 = sub1.copy()

print ('counts for original to only male how many attempting commiting suicide have in the last 12 months') c1 = sub2['H1SU2'].value_counts(sort=False, dropna=False) print(c1)

examining frequency distributions

In this section we will replace the fields from the code list where the men refused to answer as well as the fields that were skipped with Nan - recode missing values to python missing (NaN)

sub2['H1SU2']=sub2['H1SU2'].replace(7, numpy.nan) sub2['H1SU2']=sub2['H1SU2'].replace(6, numpy.nan)

print ('counts for only male how many attempting commiting suicide have in the last 12 months with 7 set to NAN and number of missing requested') c2 = sub2['H1SU2'].value_counts(sort=False, dropna=False) print(c2)

examining frequency how many male attempting commiting suicide have in the last 12 months

print("counts for how many male attempting commiting suicide have in the last 12 months") c5 = sub2["H1SU2"].value_counts(sort=False) print(c5)

print("percentages for how many male attempting commiting suicide have in the last 12 months") p5 = sub2["H1SU2"].value_counts(sort=True, normalize=True) print (p5)

The results show that half of the respondents attempted murder at least once, or 50%, as we can see from the results

#week 3 #Assignment: Making Data Management Desicions

week 2:

Assignment : Running Your First Program

import pandas import numpy

data = pandas.read_csv('addhealth_pds.csv', low_memory=False)

print (len(data)) #number of observations (rows) print (len(data.columns)) # number of variables (columns)

setting variables you will be working with to numeric

counts and percentages (i.e. frequency distributions) for each variable

c1 = data['BIO_SEX'].value_counts(sort=False) print (c1)

p1 = data['BIO_SEX'].value_counts(sort=False, normalize=True) print (p1)

c2 = data['H1SU1'].value_counts(sort=False) print(c2)

p2 = data['H1SU1'].value_counts(sort=False, normalize=True) print (p2)

c3 = data['H1SU2'].value_counts(sort=False) print(c3)

p3 = data['H1SU2'].value_counts(sort=False, normalize=True) print (p3)

c4 = data['H1TO15'].value_counts(sort=False) print(c4)

ADDING TITLES

print ('counts for BIO_SEX') c1 = data['BIO_SEX'].value_counts(sort=False) print (c1)

print (len(data['BIO_SEX'])) #number of observations (rows)

print ('percentages for BIO_SEX') p1 = data['BIO_SEX'].value_counts(sort=False, normalize=True) print (p1)

print ('counts for H1SU1') c2 = data['H1SU1'].value_counts(sort=False) print(c2)

print ('percentages for H1SU1') p2 = data['H1SU1'].value_counts(sort=False, normalize=True) print (p2)

print ('counts for H1SU2') c3 = data['H1SU2'].value_counts(sort=False, dropna=False) print(c3)

print ('percentages for H1SU2') p3 = data['H1SU2'].value_counts(sort=False, normalize=True) print (p3)

print ('counts for H1TO15') c4 = data['H1TO15'].value_counts(sort=False, dropna=False) print(c4)

print ('percentages for H1TO15') p4 = data['H1TO15'].value_counts(sort=False, dropna=False, normalize=True) print (p4)

ADDING MORE DESCRIPTIVE TITLES

print('counts for BIO_SEX“ what is the gender') c1 = data['BIO_SEX'].value_counts(sort=False) print (c1)

print('percentages for BIO_SEX what is the gender') p1 = data['BIO_SEX'].value_counts(sort=False, normalize=True) print (p1)

print('counts for H1SU1 seriosly thinking about suicide in the last 12 months') c2 = data['H1SU1'].value_counts(sort=False) print(c2)

print('percentages for H1SU1 seriosly thinking about suicide in the last 12 months') p2 = data['H1SU1'].value_counts(sort=False, normalize=True) print (p2)

print('counts for H1SU2 attempting commiting suicide in the last 12 months') c3 = data['H1SU2'].value_counts(sort=False) print(c3)

print('percentages for H1SU2 attempting commiting suicide in the last 12 months') p3 = data['H1SU2'].value_counts(sort=False, normalize=True) print (p3)

print('counts for H1TO15 how many times a person thinks about alcohol during the past 12 months') c4 = data['H1TO15'].value_counts(sort=False, dropna=False) print(c4)

print('percentages for H1TO15 how many times a person thinks about alcohol during the past 12 months') p4 = data['H1TO15'].value_counts(sort=False, normalize=True) print (p4)

frequency distributions using the 'bygroup' function

ct1= data.groupby('BIO_SEX').size() print(ct1)

pt1 = data.groupby('BIO_SEX').size() * 100 / len(data) print(pt1)

subset data to male attempting commiting suicide in the last 12 months

sub1=data[(data['BIO_SEX']==1) & (data['H1SU1']==1)]

make a copy of my new subsetted data

sub2 = sub1.copy()

frequency distributions on new sub2 data frame

print('counts for BIO_SEX') c5 = sub2['BIO_SEX'].value_counts(sort=False) print(c5)

print('percentages for BIO_SEX') p5 = sub2['BIO_SEX'].value_counts(sort=False, normalize=True) print (p5)

print('counts for H1SU1') c6 = sub2['H1TO13'].value_counts(sort=False) print(c6)

print('percentages for H1SU1') p6 = sub2['H1SU1'].value_counts(sort=False, normalize=True) print (p6)

upper-case all DataFrame column names - place afer code for loading data aboave

data.columns = list(map(str.upper, data.columns))

bug fix for display formats to avoid run time errors - put after code for loading data above

pandas.set_option('display.float_format', lambda x:'%f'%x) Results: In this part we see how many male, how many female and undefined respondents there are, and what is that number expressed in percentages

Refining research question:

Research question is how many men seriously thought about committing suicide in the previous 12 months?

Is a suicide rate associated with a number of alcohol addiction cases?

Assignment 1:

Data set: AddHealth Wave 1 codebook

Research question: Is a suicide rate associated with a number of alcohol addiction cases?

Items included in the CodeBook:

for suicide rate:

Thinking about commiting suicide during the one year

During the past 12 months how many times they acctualy attempt suicide

For alcohol additction cases:

Drinkin alcohol more than 2 or 3 times in life

Relationship problems due to alcohol

Literature Review:

From original source: https://www.alcoholrehabguide.org/resources/dual-diagnosis/alcohol-and-suicide/

People who suffer from alcoholism are up to 120 times more likely to take their own life than those who are not dependent on alcohol. On average, someone dies of suicide every 40 seconds. 29% of suicide victims in America were found with alcohol in their system.

The hypothesis to explore using AddHealth Wave 1 codebook: the higher number of alcohol addiction cases, the higher risk of suicide rate.

#suicide rate associated with a number of alcohol addiction cases?

Trending Blogs

Recently Viewed Blogs

Untitled