2nd Week Assignment
This week, I experienced the genuine thrill of sitting through my first Python lesson -- and the disappointment of not being able to extract âcleanâ data from a very complex set of numbers. As explained last week, I was interested in examining the presence, if any, of a correlation between countriesâ per-child spending at the primary school level and female literacy rates. I used two sets of data: primary-school expenditure per student as a percentage of total GDP, collected by the World Bank, and the literacy rate among females aged 15-24, as collected by the United Nations Educational, Scientific and Cultural Organization (UNESCO). Unfortunately, I realized too late that I didnât pick a data set that would lend itself to a frequency distribution analysis as modeled in the videos. Iâll explain my struggles after showing my program (which just designates the two sets of data Iâm working with and outputs their size):Â
Hereâs the output:Â
The number of countries in the per-child spending data (209) is correct, so is the number of years under examination(1998 to 2007), if you remove one row. The number of countries in the female literacy table is also correct (209) but I ran into the problem of the program only recognizing ONE column for years, when in reality I adjusted the data to 2002-2011 (to accommodate scant information and reflect a four-year interval between investment and return). The problem with this type of data is that it is anything but neat. Typically, this is what the top of the female literacy table looks like:Â
My objective in this course was to be able to write a Python program that would reflect the INCREASE in both spending and literacy at the BEGINNING and the END of a set of years (here it would be 2008 and 2011 for Albania; 2002 and 2006 for Algeria) and compare the rate of increase of that variable (literacy) versus the rate of increase in educational spending. I realize that my Python code needs to be much more sophisticated than I was planning -- it will probably have to make use of a âfirstâ and âlastâ function to allow for the difference per country; it will also need to make sure each row corresponds to the same country (since the data was collected by two different world bodies: the WB and UNESCO). In short, this first brush with code has been a humbling experience, and I am looking forward to coming up with ways to overcome the âickinessâ of these incomplete, fragmented real-world numbers.Â
















