.
seen from Canada
seen from United States
seen from Canada

seen from United States
seen from Australia
seen from United States

seen from India
seen from China

seen from Singapore

seen from India

seen from Malaysia
seen from South Korea
seen from Saudi Arabia

seen from Saudi Arabia
seen from China
seen from United States

seen from India
seen from France
seen from China

seen from Australia
.
.
.
.
Sudhir’s Assignment 3
Background: In this assignment, I follow the program provided in the class notes. I have broken up the program code and output into two halves. In both parts of this assignment, I use the Gapminder Data Set. In the first half, I have made a detailed analysis of highly democractic lower middle-income countries. I define highly democratic countries to be the ones that have polity scores greater than 6. In the Gapminder Data set, polity scores range from -10 for highly authoritarian countries to +10 for highly democratic countries. Also, lower middle-income countries have per capita income between $1025 and $4036. I also introduce new variables (”Democratic Score” and “Governance Score”). Here is the python code and the output. In the second half of the assignment, I look at gender equity by tapping into the information contained in the female employment rate versus the general employment rate.
The First Half of the Program Code For Assignment 3 is below:
--------------------------------------------------------------------------------
# -*- coding: utf-8 -*- """ Created on Sun Aug 30 10:50:43 2015; modified on Sat Jun 17, 2017
@author: ldierker; modified by Sudhir """ import pandas import numpy # any additional libraries would be imported here data = pandas.read_csv('SudhirCodeBook.csv', low_memory=False) print ('_____________________________________________________') print ('OUTPUT PRINTOUT STARTS HERE') print ('Total Number of Countries') print (len(data)) #number of observations (rows) print ('Total Number of Variables') print (len(data.columns)) # number of variables (columns) #setting variables you will be working with to numeric print ('The Variables in this Database:') print('(1) country, (2) incomeperperson, (3) polityscore') print('(4) femaleemployeerate, (5) internetuserate, and (6) urbanrate') data['incomeperperson'] = data['incomeperperson'].convert_objects(convert_numeric=True) data['polityscore'] = data['polityscore'].convert_objects(convert_numeric=True) data['femaleemployrate'] = data['femaleemployrate'].convert_objects(convert_numeric=True) data['internetuserate'] = data['internetuserate'].convert_objects(convert_numeric=True) data['urbanrate'] = data['urbanrate'].convert_objects(convert_numeric=True)
#counts and percentages (i.e. frequency distributions) for each variable) print ('Examples of Sorting by Frequency and by Values') print ('List Polity Scores for Entire Data Set, Sorted by Frequency') c2 = data['polityscore'].value_counts(sort=True) print(c2) print ('List Polity Scores for Entire Data Set, sorted by Values') ct2= data.groupby('polityscore').size().sort_index(ascending=True) print (ct2) ######################################################################################## # REST OF ASSIGNMENT FOCUSES ON HIGHLY DEMOCRATIC (Polity Score > 6 ) # LOWER MIDDLE-INCOME COUNTRIES ($1025 < GNI < $4036) # #For the current 2017 fiscal year, low-income economies are defined as those with a #GNI per capita, calculated using the World Bank Atlas method, of $1,025 or less in 2015; #lower middle-income economies are those with a GNI per capita between $1,026 and $4,035; #upper middle-income economies are those with a GNI per capita between $4,036 and $12,475; #high-income economies are those with a GNI per capita of $12,476 or more. #https://datahelpdesk.worldbank.org/knowledgebase/articles/906519-world-bank-country-and-lending-groups # Polity scores range from -10 for undemocratic to +10 for highly democratic countries. # # REST OF ASSIGNMENT FOCUSES ON HIGHLY DEMOCRATIC (Polity Score > 6 ) # LOWER MIDDLE-INCOME COUNTRIES ($1025 < GNI < $4036) ########################################################################################
print ('_____________________________________________________') print('Rest of Assignment will Focus on Highly Democratic Lower Mid-Income Countries') print('Highly Democratic Countries Have Polity Scores > 6 ') print('Lower Middle-Income Countries have $1025 < Per Capital Income < $4036 ') print ('_____________________________________________________') # Sub-sampling entire data set to only pick up Highly Democratic lower middle-income countries sub1=data[(data['incomeperperson']>1025) & (data['incomeperperson']<4036) & (data['polityscore']> 6)] #make a copy of my new subsetted data sub2 = sub1.copy() # frequency distritions on new sub2 data frame # First Fill in Blanks with NaN. Then print out original counts print ('Original Counts for FemaleEmployRate For Lower Democratic Lower Mid-Income Countries, Sorted by Values') c5 = sub2['femaleemployrate'].value_counts(sort=True,dropna=False).sort_index(ascending=True) print(c5)
sub2['femaleemployrate']=sub2['femaleemployrate'].replace(-9.0, numpy.nan) sub2['femaleemployrate'].fillna(-99.0,inplace=True) print ('New Counts for FemaleEmployRate For Democratic Lower Mid-Income Countries, Sorted by Values') c55 = sub2['femaleemployrate'].value_counts(sort=True,dropna=False).sort_index(ascending=True) print(c55) #sub2['femaleemployeerate']=sub2['femaleemployrate'].replace(-99.0,numpy.nan) #c2=sub2['femaleemployrate'].value_counts(sort=True,dropna=False) #print(c2) #chk2=sub2['femaleemployrate'].value_counts(sort=True,dropna=True) #print(chk2) ds2=sub2['femaleemployrate'].describe() print('Statistics for Female Employment Rate') print(ds2) print('____________________________________________') print('I will now delete data points for which the female Employment Rate is Missing or NaN' ) sub1=data[(data['incomeperperson']>1025) & (data['incomeperperson']<4036) & (data['polityscore']> 6) & (data['femaleemployrate']>0)] sub2 = sub1.copy() recode1={7:0,8:33,9:66,10:100} print('Introducing Democratic Score, which is mapped from Polity Score') print('Polity Score Ranges from 7 to 10, and Democratic Score Ranges from 0 to 100') sub2['democraticscore']=sub2['polityscore'].map(recode1)
print('Introducing New Variable: Governance Score') print('GOVERNANCE SCORE=Democratic Score+Female Employment Rate+Urban Rate') sub2['GOVERNANCESCORE']=sub2['democraticscore'] + sub2['femaleemployrate']+sub2['urbanrate'] c3=sub2['GOVERNANCESCORE'].value_counts().sort_index(ascending=True) print ('GOVERNANCE SCORES') print(c3) sub2['INCOMEGROUP']=pandas.qcut(sub2.incomeperperson, 5, labels=["1=0-20","2=20-40","3=40-60","4=60-80","5=80-100"]) c4 = sub2['INCOMEGROUP'].value_counts(sort=False, dropna=True) print(c4) print('I break up the Lower Middle-Income group of countries into four sub-groups') print('The four sub-groups have per capital income: $1025-$1750,$1750-$2500,$2500-$3250,$3250-$4036') print('INCOME-4 categories: 1025-1750, 1750-2500, 2500-3250, 3250-4036') sub2['INCOMEGROUP']=pandas.cut(sub2.incomeperperson,[1025,1750,2500,3250,4036]) c5=sub2['INCOMEGROUP'].value_counts(sort=True,dropna=True) print(c5) print('printing crosstabs') print(pandas.crosstab(sub2['INCOMEGROUP'],sub2['incomeperperson']))
print('____________________________________________')
print ('Counts for Income Per Person for Democratic Lower Middle-Income Countries, Sorted by Values') c6 = sub2['incomeperperson'].value_counts().sort_index(ascending=True) print(c6) print ('Percentages for Income Per Person for Democratic Lower Middle-Income Countries, Sorted by Values') p6 = sub2['incomeperperson'].value_counts(normalize=True).sort_index(ascending=True) * 100 print (p6) print ('Counts for Female Employ Rate for Democratic Lower Middle-Income Countries, Sorted by Values') c7 = sub2['femaleemployrate'].value_counts().sort_index(ascending=True) print(c7) print ('Percentages for Female Employ Rate for Democratic Lower Middle-Income Countries, Sorted by Values') p7 = sub2['femaleemployrate'].value_counts(normalize=True).sort_index(ascending=True) * 100 print (p7) print ('Counts for Polity Scores for Democratic Lower Middle-Income Countries, Sorted by Values') c8 = sub2['polityscore'].value_counts().sort_index(ascending=True) print (c8) print ('Percentages for Polity Scores for Democratic Lower Middle-Income Countries, Sorted by Values') p8 = sub2['polityscore'].value_counts(normalize=True).sort_index(ascending=True) * 100 print (p8) print ('OUTPUT PRINTOUT ENDS HERE') print ('_____________________________________________________') #upper-case all DataFrame column names - place afer code for loading data aboave data.columns = map(str.upper, data.columns) # bug fix for display formats to avoid run time errors - put after code for loading data above pandas.set_option('display.float_format', lambda x:'%f'%x) print('END OF OUTPUT FROM FIRST HALF OF PYTHON CODE FOR ASSIGNMENT 3') print('_______________________________________________________________')
Here is the Output for the first half of the python code
------------------------------------------------------------------------------------
runfile('C:/Users/sudhir/Desktop/PERSONAL.BIZ.WORK.JUL2017/COURSERA/DataAnalysisInterPret/WEEK3/SudhirAssignment31.8.py', wdir='C:/Users/sudhir/Desktop/PERSONAL.BIZ.WORK.JUL2017/COURSERA/DataAnalysisInterPret/WEEK3') _____________________________________________________ OUTPUT PRINTOUT STARTS HERE Total Number of Countries 213 Total Number of Variables 6 The Variables in this Database: (1) country, (2) incomeperperson, (3) polityscore (4) femaleemployeerate, (5) internetuserate, and (6) urbanrate Examples of Sorting by Frequency and by Values List Polity Scores for Entire Data Set, Sorted by Frequency 10.000000 33 8.000000 19 9.000000 15 7.000000 13 -7.000000 12 6.000000 10 5.000000 7 -4.000000 6 0.000000 6 -3.000000 6 -2.000000 5 -1.000000 4 -9.000000 4 4.000000 4 1.000000 3 -6.000000 3 2.000000 3 -8.000000 2 3.000000 2 -5.000000 2 -10.000000 2 Name: polityscore, dtype: int64 List Polity Scores for Entire Data Set, sorted by Values polityscore -10.000000 2 -9.000000 4 -8.000000 2 -7.000000 12 -6.000000 3 -5.000000 2 -4.000000 6 -3.000000 6 -2.000000 5 -1.000000 4 0.000000 6 1.000000 3 2.000000 3 3.000000 2 4.000000 4 5.000000 7 6.000000 10 7.000000 13 8.000000 19 9.000000 15 10.000000 33 dtype: int64 _____________________________________________________ Rest of Assignment will Focus on Highly Democratic Lower Mid-Income Countries Highly Democratic Countries Have Polity Scores > 6 Lower Middle-Income Countries have $1025 < Per Capital Income < $4036 _____________________________________________________ Original Counts for FemaleEmployRate For Lower Democratic Lower Mid-Income Countries, Sorted by Values 26.799999 1 34.200001 1 34.299999 1 42.099998 2 43.799999 1 44.000000 1 44.099998 1 44.799999 1 46.799999 1 47.500000 1 49.400002 1 51.299999 1 54.900002 1 59.799999 1 61.599998 1 65.300003 1 nan 2 Name: femaleemployrate, dtype: int64 New Counts for FemaleEmployRate For Democratic Lower Mid-Income Countries, Sorted by Values -99.000000 2 26.799999 1 34.200001 1 34.299999 1 42.099998 2 43.799999 1 44.000000 1 44.099998 1 44.799999 1 46.799999 1 47.500000 1 49.400002 1 51.299999 1 54.900002 1 59.799999 1 61.599998 1 65.300003 1 Name: femaleemployrate, dtype: int64 Statistics for Female Employment Rate count 19.000000 mean 31.305263 std 46.883040 min -99.000000 25% 38.199999 50% 44.099998 75% 50.350000 max 65.300003 Name: femaleemployrate, dtype: float64 ____________________________________________ I will now delete data points for which the female Employment Rate is Missing or NaN Introducing Democratic Score, which is mapped from Polity Score Polity Score Ranges from 7 to 10, and Democratic Score Ranges from 0 to 100 Introducing New Variable: Governance Score GOVERNANCE SCORE=Democratic Score+Female Employment Rate+Urban Rate GOVERNANCE SCORES 82.080001 1 102.259999 1 117.380002 1 125.580000 1 127.179998 1 128.559998 1 129.400002 1 138.499999 1 144.719999 1 154.819998 1 158.600003 1 159.699999 1 161.039999 1 164.039999 1 166.800000 1 179.199998 1 197.199999 1 Name: GOVERNANCESCORE, dtype: int64 1=0-20 4 2=20-40 3 3=40-60 3 4=60-80 3 5=80-100 4 Name: INCOMEGROUP, dtype: int64 I break up the Lower Middle-Income group of countries into four sub-groups The four sub-groups have per capital income: $1025-$1750,$1750-$2500,$2500-$3250,$3250-$4036 INCOME-4 categories: 1025-1750, 1750-2500, 2500-3250, 3250-4036 (1025, 1750] 7 (2500, 3250] 5 (1750, 2500] 3 (3250, 4036] 2 Name: INCOMEGROUP, dtype: int64 printing crosstabs incomeperperson 1036.830725 1143.831514 1144.102193 1232.794137 \ INCOMEGROUP (1025, 1750] 1 1 1 1 (1750, 2500] 0 0 0 0 (2500, 3250] 0 0 0 0 (3250, 4036] 0 0 0 0
incomeperperson 1383.401869 1392.411829 1621.177078 1860.753895 \ INCOMEGROUP (1025, 1750] 1 1 1 0 (1750, 2500] 0 0 0 1 (2500, 3250] 0 0 0 0 (3250, 4036] 0 0 0 0
incomeperperson 1914.996551 2221.185664 2549.558474 2557.433638 \ INCOMEGROUP (1025, 1750] 0 0 0 0 (1750, 2500] 1 1 0 0 (2500, 3250] 0 0 1 1 (3250, 4036] 0 0 0 0
incomeperperson 2636.787800 3180.430612 3233.423780 3665.348369 \ INCOMEGROUP (1025, 1750] 0 0 0 0 (1750, 2500] 0 0 0 0 (2500, 3250] 1 1 1 0 (3250, 4036] 0 0 0 1
incomeperperson 3745.649852 INCOMEGROUP (1025, 1750] 0 (1750, 2500] 0 (2500, 3250] 0 (3250, 4036] 1 ____________________________________________ Counts for Income Per Person for Democratic Lower Middle-Income Countries, Sorted by Values C:/Users/sudhir/Desktop/PERSONAL.BIZ.WORK.JUL2017/COURSERA/DataAnalysisInterPret/WEEK3/SudhirAssignment31.8.py:21: FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric. data['incomeperperson'] = data['incomeperperson'].convert_objects(convert_numeric=True) C:/Users/sudhir/Desktop/PERSONAL.BIZ.WORK.JUL2017/COURSERA/DataAnalysisInterPret/WEEK3/SudhirAssignment31.8.py:22: FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric. data['polityscore'] = data['polityscore'].convert_objects(convert_numeric=True) C:/Users/sudhir/Desktop/PERSONAL.BIZ.WORK.JUL2017/COURSERA/DataAnalysisInterPret/WEEK3/SudhirAssignment31.8.py:23: FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric. data['femaleemployrate'] = data['femaleemployrate'].convert_objects(convert_numeric=True) C:/Users/sudhir/Desktop/PERSONAL.BIZ.WORK.JUL2017/COURSERA/DataAnalysisInterPret/WEEK3/SudhirAssignment31.8.py:24: FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric. data['internetuserate'] = data['internetuserate'].convert_objects(convert_numeric=True) C:/Users/sudhir/Desktop/PERSONAL.BIZ.WORK.JUL2017/COURSERA/DataAnalysisInterPret/WEEK3/SudhirAssignment31.8.py:25: FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric. data['urbanrate'] = data['urbanrate'].convert_objects(convert_numeric=True) 1036.830725 1 1143.831514 1 1144.102193 1 1232.794137 1 1383.401869 1 1392.411829 1 1621.177078 1 1860.753895 1 1914.996551 1 2221.185664 1 2549.558474 1 2557.433638 1 2636.787800 1 3180.430612 1 3233.423780 1 3665.348369 1 3745.649852 1 Name: incomeperperson, dtype: int64 Percentages for Income Per Person for Democratic Lower Middle-Income Countries, Sorted by Values 1036.830725 5.882353 1143.831514 5.882353 1144.102193 5.882353 1232.794137 5.882353 1383.401869 5.882353 1392.411829 5.882353 1621.177078 5.882353 1860.753895 5.882353 1914.996551 5.882353 2221.185664 5.882353 2549.558474 5.882353 2557.433638 5.882353 2636.787800 5.882353 3180.430612 5.882353 3233.423780 5.882353 3665.348369 5.882353 3745.649852 5.882353 Name: incomeperperson, dtype: float64 Counts for Female Employ Rate for Democratic Lower Middle-Income Countries, Sorted by Values 26.799999 1 34.200001 1 34.299999 1 42.099998 2 43.799999 1 44.000000 1 44.099998 1 44.799999 1 46.799999 1 47.500000 1 49.400002 1 51.299999 1 54.900002 1 59.799999 1 61.599998 1 65.300003 1 Name: femaleemployrate, dtype: int64 Percentages for Female Employ Rate for Democratic Lower Middle-Income Countries, Sorted by Values 26.799999 5.882353 34.200001 5.882353 34.299999 5.882353 42.099998 11.764706 43.799999 5.882353 44.000000 5.882353 44.099998 5.882353 44.799999 5.882353 46.799999 5.882353 47.500000 5.882353 49.400002 5.882353 51.299999 5.882353 54.900002 5.882353 59.799999 5.882353 61.599998 5.882353 65.300003 5.882353 Name: femaleemployrate, dtype: float64 Counts for Polity Scores for Democratic Lower Middle-Income Countries, Sorted by Values 7.000000 4 8.000000 6 9.000000 7 Name: polityscore, dtype: int64 Percentages for Polity Scores for Democratic Lower Middle-Income Countries, Sorted by Values 7.000000 23.529412 8.000000 35.294118 9.000000 41.176471 Name: polityscore, dtype: float64 OUTPUT PRINTOUT ENDS HERE _____________________________________________________ END OF OUTPUT FROM FIRST HALF OF PYTHON CODE FOR ASSIGNMENT 3 _______________________________________________________________
The second half of the Python Code for Assignment 3 is below:
-------------------------------------------------------------------------
# -*- coding: utf-8 -*- """ Created on Sat Jun 17 23:50:31 2017
@author: sudhir """
#ADDHEALTH EXAMPLE
import pandas import numpy
data = pandas.read_csv('gapminder.csv', low_memory=False)
#making individual PROGRESS variables numeric data['employrate'] = data['employrate'].convert_objects(convert_numeric=True) data['femaleemployrate'] = data['femaleemployrate'].convert_objects(convert_numeric=True) data['internetuserate'] = data['internetuserate'].convert_objects(convert_numeric=True) data['urbanrate'] = data['urbanrate'].convert_objects(convert_numeric=True) data['armedforcesrate'] = data['armedforcesrate'].convert_objects(convert_numeric=True)
#Set missing data to NAN data['employrate']=data['employrate'].replace('', numpy.nan)
data['femaleemployrate']=data['femaleemployrate'].replace('', numpy.nan)
data['internetuserate']=data['internetuserate'].replace('', numpy.nan)
data['urbanrate']=data['urbanrate'].replace('', numpy.nan)
data['armedforcesrate']=data['armedforcesrate'].replace('', numpy.nan)
#count of number of PROGRESS categories endorsed, GENDEREQUITY data['GENDEREQUITY']=data['femaleemployrate'] - data['employrate'] print('GENDEREQUITY')
# subset variables in new data frame, sub1 sub1=data[['country','employrate', 'femaleemployrate', 'internetuserate', 'urbanrate', 'armedforcesrate', 'GENDEREQUITY']] sub1=data[(data['femaleemployrate']>=40.0) & (data['internetuserate']>60)] print ('printing a before PROGRESS variable,n=5') a = sub1.head (n=10) print(a)
#new PROGRESS variable, categorical 1 through 6 def PROGRESS (row): if (row['GENDEREQUITY'] > 0) & (row['femaleemployrate'] > 50) : # if GENDEREQUITY > 0, then female employment rate is greater # than male employment rate, indicating great gender equity return 1 # if gender equity < 0, we look at employment rate. If that is # greater than 50%, then the country has high employment if (row['GENDEREQUITY'] > 0) & (row['femaleemployrate'] <= 50) : # if GENDEREQUITY > 0, then female employment rate is greater # than male employment rate, indicating great gender equity return 2 # if gender equity < 0, we look at employment rate. If that is # greater than 50%, then the country has high employment if (row['GENDEREQUITY'] < 0) & (row['GENDEREQUITY'] >= -10) & (row['femaleemployrate'] >= 50) : return 3 if (row['GENDEREQUITY'] < 0) & (row['GENDEREQUITY'] >= -10) & (row['femaleemployrate'] <= 50): return 4 if (row['GENDEREQUITY'] < -10) & (row['femaleemployrate'] >= 50): return 5 if (row['GENDEREQUITY'] < -10) & (row['femaleemployrate'] < 50): return 6 sub1['PROGRESS'] = sub1.apply (lambda row: PROGRESS (row),axis=1)
print ('printing a after PROGRESS variable,n=7') a = sub1.head (n=12) print(a)
#frequency distributions for primary and secondary ethinciity variables print ('counts for Employ Rate') c10 = sub1['employrate'].value_counts(sort=True) print(c10)
print ('percentages for Employ Rate') p10 = sub1['employrate'].value_counts(sort=True, normalize=True) print (p10)
print ('counts for Female Employ Rate') c11 = sub1['femaleemployrate'].value_counts(sort=True) print(c11)
print ('percentages for Female Employ Rate') p11= sub1['femaleemployrate'].value_counts(sort=True, normalize=True) print (p11)
print ('counts for Internet Use Rate') c12 = sub1['internetuserate'].value_counts(sort=True) print(c12)
print ('percentages for Internet Use Rate') p12 = sub1['internetuserate'].value_counts(sort=True, normalize=True) print (p12)
print ('counts for Urban Rate') c13 = sub1['urbanrate'].value_counts(sort=True) #print(c13)
print ('percentages for Urban Rate') p13 = sub1['urbanrate'].value_counts(sort=True, normalize=True) #print (p13)
print ('counts for Armed Forces Rate') c14 = sub1['armedforcesrate'].value_counts(sort=True) #print(c14)
print ('percentages for Armed Forces Rate') p14 = sub1['armedforcesrate'].value_counts(sort=True, normalize=True) #print (p14)
print ('counts for number of Gender Equity') c15 = sub1['GENDEREQUITY'].value_counts(sort=True) print(c15)
print ('counts for PROGRESS CATEGORIES') c16 = sub1['PROGRESS'].value_counts(sort=True) print(c16)
print ('percentages for PROGRESS CATEGORIES') p16 = sub1['PROGRESS'].value_counts(sort=True, normalize=True) print (p16)
------------------------------------------------------------------------------------------------
The Output for the second half of the python code for Assignment 3 is below
---------------------------------------------------------------------------------------------
runfile('C:/Users/sudhir/Desktop/PERSONAL.BIZ.WORK.JUL2017/COURSERA/DataAnalysisInterPret/WEEK3/SudhirAssignment32.7.py', wdir='C:/Users/sudhir/Desktop/PERSONAL.BIZ.WORK.JUL2017/COURSERA/DataAnalysisInterPret/WEEK3') C:/Users/sudhir/Desktop/PERSONAL.BIZ.WORK.JUL2017/COURSERA/DataAnalysisInterPret/WEEK3/SudhirAssignment32.7.py:16: FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric. data['employrate'] = data['employrate'].convert_objects(convert_numeric=True) C:/Users/sudhir/Desktop/PERSONAL.BIZ.WORK.JUL2017/COURSERA/DataAnalysisInterPret/WEEK3/SudhirAssignment32.7.py:17: FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric. data['femaleemployrate'] = data['femaleemployrate'].convert_objects(convert_numeric=True) C:/Users/sudhir/Desktop/PERSONAL.BIZ.WORK.JUL2017/COURSERA/DataAnalysisInterPret/WEEK3/SudhirAssignment32.7.py:18: FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric. data['internetuserate'] = data['internetuserate'].convert_objects(convert_numeric=True) C:/Users/sudhir/Desktop/PERSONAL.BIZ.WORK.JUL2017/COURSERA/DataAnalysisInterPret/WEEK3/SudhirAssignment32.7.py:19: FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric. data['urbanrate'] = data['urbanrate'].convert_objects(convert_numeric=True) C:/Users/sudhir/Desktop/PERSONAL.BIZ.WORK.JUL2017/COURSERA/DataAnalysisInterPret/WEEK3/SudhirAssignment32.7.py:20: FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric. data['armedforcesrate'] = data['armedforcesrate'].convert_objects(convert_numeric=True) GENDEREQUITY printing a before PROGRESS variable,n=5 country incomeperperson alcconsumption armedforcesrate \ 9 Australia 25249.98606 10.21 0.486280 10 Austria 26692.98411 12.4 0.815580 15 Barbados 9243.587053 6.42 0.663956 17 Belgium 24496.04826 10.41 0.815648 32 Canada 25575.35262 10.2 0.342976 49 Czech Rep. 7381.312751 16.47 0.515706 50 Denmark 30532.27704 12.02 1.012373 59 Estonia 6238.537506 17.24 0.998428 63 Finland 27110.73159 13.1 1.177416 64 France 22878.46657 12.48 1.233780
breastcancerper100th co2emissions femaleemployrate hivrate \ 9 83.2 12970092667 54.599998 0.1 10 70.5 4466084333 49.700001 0.3 15 62.5 36160666.67 60.299999 1.4 17 92 10897025333 41.700001 0.2 32 84.3 24979045667 58.900002 0.2 49 58.4 1776016000 47.599998 0.06 50 88.7 3503877667 58.099998 0.2 59 47.7 277170666.7 52.099998 1.2 63 84.7 2420300667 53.400002 0.1 64 91.9 33341634333 45.599998 0.4
internetuserate lifeexpectancy oilperperson polityscore \ 9 75.895654 81.907 1.913026109 10 10 72.731576 80.854 1.548790966 10 15 70.028599 76.835 17 73.733934 80.009 8 32 81.338393 81.012 3.007355851 10 49 68.638133 77.685 0.876778335 8 50 88.770254 78.826 1.567527461 10 59 74.163040 74.825 9 63 86.898845 79.977 1.938654268 10 64 77.498619 81.539 1.328291411 9
relectricperperson suicideper100th employrate urbanrate GENDEREQUITY 9 2825.391095 8.470030125 61.500000 88.74 -6.900002 10 2068.123309 13.09437 57.099998 67.16 -7.399998 15 3.108602524 66.900002 39.84 -6.600002 17 1920.962215 15.95385 48.599998 97.36 -6.899998 32 4772.370648 10.10099 63.500000 80.40 -4.599998 49 1438.780412 12.36798 56.000000 73.50 -8.400002 50 1884.299342 8.973104 63.099998 86.68 -5.000000 59 1411.230532 16.95924 56.500000 69.46 -4.400002 63 4036.953993 16.23437 57.200001 63.30 -3.799999 64 2539.753273 14.09153 51.200001 77.36 -5.600002 printing a after PROGRESS variable,n=7 country incomeperperson alcconsumption armedforcesrate \ 9 Australia 25249.98606 10.21 0.486280 10 Austria 26692.98411 12.4 0.815580 15 Barbados 9243.587053 6.42 0.663956 17 Belgium 24496.04826 10.41 0.815648 32 Canada 25575.35262 10.2 0.342976 49 Czech Rep. 7381.312751 16.47 0.515706 50 Denmark 30532.27704 12.02 1.012373 59 Estonia 6238.537506 17.24 0.998428 63 Finland 27110.73159 13.1 1.177416 64 France 22878.46657 12.48 1.233780 69 Germany 25306.18719 12.14 0.575810 83 Hong Kong, China 35536.07247 NaN
breastcancerper100th co2emissions femaleemployrate hivrate \ 9 83.2 12970092667 54.599998 0.1 10 70.5 4466084333 49.700001 0.3 15 62.5 36160666.67 60.299999 1.4 17 92 10897025333 41.700001 0.2 32 84.3 24979045667 58.900002 0.2 49 58.4 1776016000 47.599998 0.06 50 88.7 3503877667 58.099998 0.2 59 47.7 277170666.7 52.099998 1.2 63 84.7 2420300667 53.400002 0.1 64 91.9 33341634333 45.599998 0.4 69 79.8 41229554667 46.799999 0.1 83 1026813333 51.599998
internetuserate lifeexpectancy oilperperson polityscore \ 9 75.895654 81.907 1.913026109 10 10 72.731576 80.854 1.548790966 10 15 70.028599 76.835 17 73.733934 80.009 8 32 81.338393 81.012 3.007355851 10 49 68.638133 77.685 0.876778335 8 50 88.770254 78.826 1.567527461 10 59 74.163040 74.825 9 63 86.898845 79.977 1.938654268 10 64 77.498619 81.539 1.328291411 9 69 82.526898 80.414 1.398500033 10 83 71.849124 82.759 2.282655406
relectricperperson suicideper100th employrate urbanrate GENDEREQUITY \ 9 2825.391095 8.470030125 61.500000 88.74 -6.900002 10 2068.123309 13.09437 57.099998 67.16 -7.399998 15 3.108602524 66.900002 39.84 -6.600002 17 1920.962215 15.95385 48.599998 97.36 -6.899998 32 4772.370648 10.10099 63.500000 80.40 -4.599998 49 1438.780412 12.36798 56.000000 73.50 -8.400002 50 1884.299342 8.973104 63.099998 86.68 -5.000000 59 1411.230532 16.95924 56.500000 69.46 -4.400002 63 4036.953993 16.23437 57.200001 63.30 -3.799999 64 2539.753273 14.09153 51.200001 77.36 -5.600002 69 1693.891898 9.211085 53.500000 73.64 -6.700001 83 1468.640784 59.000000 100.00 -7.400002
PROGRESS 9 3 10 4 15 3 17 4 32 3 49 4 50 3 59 3 63 3 64 4 69 4 83 3 counts for Employ Rate 53.500000 2 65.000000 2 48.599998 1 62.299999 1 66.900002 1 63.500000 1 56.000000 1 56.500000 1 59.000000 1 73.599998 1 47.299999 1 57.200001 1 59.900002 1 52.500000 1 57.099998 1 53.099998 1 60.700001 1 58.900002 1 51.200001 1 62.400002 1 61.299999 1 51.299999 1 53.400002 1 48.700001 1 56.799999 1 55.900002 1 63.099998 1 64.300003 1 59.299999 1 57.299999 1 61.500000 1 Name: employrate, dtype: int64 percentages for Employ Rate 53.500000 0.060606 65.000000 0.060606 48.599998 0.030303 62.299999 0.030303 66.900002 0.030303 63.500000 0.030303 56.000000 0.030303 56.500000 0.030303 59.000000 0.030303 73.599998 0.030303 47.299999 0.030303 57.200001 0.030303 59.900002 0.030303 52.500000 0.030303 57.099998 0.030303 53.099998 0.030303 60.700001 0.030303 58.900002 0.030303 51.200001 0.030303 62.400002 0.030303 61.299999 0.030303 51.299999 0.030303 53.400002 0.030303 48.700001 0.030303 56.799999 0.030303 55.900002 0.030303 63.099998 0.030303 64.300003 0.030303 59.299999 0.030303 57.299999 0.030303 61.500000 0.030303 Name: employrate, dtype: float64 counts for Female Employ Rate 41.700001 2 47.599998 1 45.599998 1 58.900002 1 48.000000 1 51.599998 1 57.000000 1 54.299999 1 56.000000 1 69.599998 1 60.299999 1 49.700001 1 53.400002 1 56.700001 1 54.599998 1 45.299999 1 40.299999 1 50.700001 1 53.099998 1 49.400002 1 46.400002 1 58.299999 1 52.099998 1 45.900002 1 46.200001 1 48.799999 1 60.900002 1 58.099998 1 42.099998 1 51.299999 1 46.799999 1 51.000000 1 Name: femaleemployrate, dtype: int64 percentages for Female Employ Rate 41.700001 0.060606 47.599998 0.030303 45.599998 0.030303 58.900002 0.030303 48.000000 0.030303 51.599998 0.030303 57.000000 0.030303 54.299999 0.030303 56.000000 0.030303 69.599998 0.030303 60.299999 0.030303 49.700001 0.030303 53.400002 0.030303 56.700001 0.030303 54.599998 0.030303 45.299999 0.030303 40.299999 0.030303 50.700001 0.030303 53.099998 0.030303 49.400002 0.030303 46.400002 0.030303 58.299999 0.030303 52.099998 0.030303 45.900002 0.030303 46.200001 0.030303 48.799999 0.030303 60.900002 0.030303 58.099998 0.030303 42.099998 0.030303 51.299999 0.030303 46.799999 0.030303 51.000000 0.030303 Name: femaleemployrate, dtype: float64 counts for Internet Use Rate 90.016190 1 93.277508 1 62.811900 1 72.731576 1 74.247572 1 82.515928 1 81.338393 1 69.770394 1 70.028599 1 71.849124 1 83.002584 1 75.895654 1 65.387786 1 73.733934 1 84.731705 1 79.889777 1 82.526898 1 90.079527 1 82.166660 1 62.471230 1 65.808554 1 68.638133 1 86.898845 1 71.131707 1 65.163251 1 74.163040 1 77.498619 1 95.638113 1 69.339971 1 71.514724 1 88.770254 1 90.703555 1 77.638535 1 Name: internetuserate, dtype: int64 percentages for Internet Use Rate 90.016190 0.030303 93.277508 0.030303 62.811900 0.030303 72.731576 0.030303 74.247572 0.030303 82.515928 0.030303 81.338393 0.030303 69.770394 0.030303 70.028599 0.030303 71.849124 0.030303 83.002584 0.030303 75.895654 0.030303 65.387786 0.030303 73.733934 0.030303 84.731705 0.030303 79.889777 0.030303 82.526898 0.030303 90.079527 0.030303 82.166660 0.030303 62.471230 0.030303 65.808554 0.030303 68.638133 0.030303 86.898845 0.030303 71.131707 0.030303 65.163251 0.030303 74.163040 0.030303 77.498619 0.030303 95.638113 0.030303 69.339971 0.030303 71.514724 0.030303 88.770254 0.030303 90.703555 0.030303 77.638535 0.030303 Name: internetuserate, dtype: float64 counts for Urban Rate percentages for Urban Rate counts for Armed Forces Rate percentages for Armed Forces Rate counts for number of Gender Equity -7.000000 2 -4.000000 2 -6.700001 2 -6.600002 2 -5.600002 1 -7.399998 1 -11.099998 1 -6.500000 1 -6.299999 1 -8.400002 1 -7.300003 1 -4.899998 1 -6.099998 1 -6.900002 1 -7.599998 1 -8.100002 1 -8.900002 1 -6.200001 1 -7.400002 1 -10.799999 1 -3.799999 1 -10.900002 1 -11.100002 1 -6.899998 1 -4.099998 1 -4.299999 1 -4.599998 1 -4.400002 1 -5.000000 1 Name: GENDEREQUITY, dtype: int64 counts for PROGRESS CATEGORIES 3 17 4 12 6 3 5 1 Name: PROGRESS, dtype: int64 percentages for PROGRESS CATEGORIES 3 0.515152 4 0.363636 6 0.090909 5 0.030303 Name: PROGRESS, dtype: float64 C:/Users/sudhir/Desktop/PERSONAL.BIZ.WORK.JUL2017/COURSERA/DataAnalysisInterPret/WEEK3/SudhirAssignment32.7.py:67: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy sub1['PROGRESS'] = sub1.apply (lambda row: PROGRESS (row),axis=1)
,,,,,,,,,,,,,
.
.