Week 4
Creating graphs for your data code
import pandas import numpy import pandas as pd import seaborn import matplotlib.pyplot as plt
data = pd.read_csv('gapminder_pds.csv', low_memory=False)
bindata = data.copy()
convert variables to numeric format using convert_objects function
data['internetuserate'] = pd.to_numeric(data['internetuserate'], errors='coerce') bindata['internetuserate'] = pd.cut(data.internetuserate, 10)
data['incomeperperson'] = pd.to_numeric(data['incomeperperson'], errors='coerce') bindata['incomeperperson'] = pd.cut(data.incomeperperson, 10)
data['employrate'] = pd.to_numeric(data['employrate'], errors='coerce') bindata['employrate'] = pd.cut(data.employrate, 10)
data['femaleemployrate'] = pd.to_numeric(data['femaleemployrate'], errors='coerce') bindata['femaleemployrate'] = pd.cut(data.femaleemployrate, 10)
data['polityscore'] = pd.to_numeric(data['polityscore'], errors='coerce') bindata['polityscore'] = data['polityscore'] sub2 = bindata.copy()
Scatterplot for the Association Between Employment rate and lifeexpectancy
scat1 = seaborn.regplot(x="internetuserate", y="incomeperperson", fit_reg=False, data=data) plt.xlabel('Internet use rate') plt.ylabel('Income per person') plt.title('Scatterplot for the Association Between Internet use rate and Income per person')
This scatterplot show the relationship and seems to be exponential.
Univariate histogram for quantitative variable:
seaborn.distplot(data["incomeperperson"].dropna(), kde=False); plt.xlabel('Income per person')
The graph is highly right skewed. Incomes are small for most of the world and the wealthy tail is quite long.
Univariate histogram for quantitative variable:
seaborn.distplot(data["employrate"].dropna(), kde=False); plt.xlabel('Employ rate')
Summary
It looks like there are associations between Internet use rate and income per person going up with internet use rate and going up an an accelerating rate.










