Variable names - OK. Variable meanings - huh?
How do you analyze data when you don't know what it means? What the heck is a polity score? How is gross domestic product measured? Is this my "dumb American" side rearing it's ugly head? LOL
I had to do a bit of initial research to make sure my research questions were appropriate for the data. I had to look beyond the Gapminder codebook to learn more about the variables.
Once I understood everything, and after revising my research questions, I chose the following variables from the Gapminder dataset for my analysis:
country (unique identifier) - string variable, list of countries, 289 unique values
incomeperperson - continuous numeric variable
2010 Gross Domestic Product per capita in constant 2000 US$.
Work Development inflation but not the differences in the cost of living between countries has been taken into account.
femaleemployrate - numeric variable indicating the percentage of female employment, range from 11.3 - 83.3%
2007 female employees age 15+ (% of population)
Percentage of female population, age above 15, that has been employed during the given year.
internetuserate - numeric variable indicating the percentage of internet users ranging from .21 - 95.6%
2010 Internet users (per 100 people)
Internet users are people with access to the worldwide network.
polityscore - numeric variable, ranging from -10 to 10
2009 Democracy score (Polity)
Overall polity score from the Polity IV dataset, calculated by subtracting an autocracy score from a democracy score. The summary measure of a country's democratic and free nature. -10 is the lowest value, 10 the highest.
Also: From Polity IV Project - http://www.systemicpeace.org/polity/polity4.htm Brief Description: The Polity conceptual scheme is unique in that it examines concomitant qualities of democratic and autocratic authority in governing institutions, rather than discreet and mutually exclusive forms of governance. This perspective envisions a spectrum of governing authority that spans from fully institutionalized autocracies through mixed, or incoherent, authority regimes (termed "anocracies") to fully institutionalized democracies. The "Polity Score" captures this regime authority spectrum on a 21-point scale ranging from -10 (hereditary monarchy) to +10 (consolidated democracy). The Polity scores can also be converted to regime categories: we recommend a three-part categorization of "autocracies" (-10 to -6), "anocracies" (-5 to +5 and the three special values: -66, -77, and -88), and "democracies" (+6 to +10); see "Global Trends in Governance, 1800-2011" above.
employrate - numeric variable indicating the percentage of employment, range from 32 - 83.2%
2007 total employees age 15+ (% of population)
Percentage of total population, age above 15, that has been employed during the given year.
urbanrate - numeric variable indicating the percentage of population in urban areas, ranging from 10.4 - 100%
2008 urban population (% of total)
Urban population refers to people living in urban areas as defined by national statistical offices (calculated using World Bank population estimates and urban ratios from the United Nations World Urbanization Prospects)