Frequency Distribution for the GapMinder Dataset (Part 1)
This post outlines the step to step Python code written for the Frequency Analysis.
Summary of the Frequency Distribution Observations
For this research, the variables are 'incomeperperson', 'urbanrate', 'relectricperperson', maybe 'employrate', and 'co2emission' for later study
All the variables have 213 observations
All the observations are of continuous value with most of them occurring only once(Frequency=1, Percentage=0.469484)
Missing data are denoted by empty strings in all variables
Missing data are the mode of all the Frequency Distributions but vary from variable to variable
This is a step to step outline of the code cells and the respective output for easy understanding. All the Frequency Distribution values are tabulated for easy understanding and brevity.
Cell 1: Import the modules needed in this task.
Cell 2: Load the Gapminder dataset used in the task and store in data.
This Output shows the head and tail of the Dataframe, i.e. the first and last five(5) observations of our dataset.
Cell 3: Create a new dataframe with only the variables needed for the research analysis. For this, research as listed in the first post, only ‘incomeperperson’, ‘urbanrate’, ‘relectricrate’, and maybe ‘employrate’ will be used for current research analysis, while ‘co2emission’ will be used for later study.
The new dataset is stored with the name data_new.
The output of the code gives the head and tail of the new dataframe, i.e. the first and last five (5) observations in the dataframe.
Cell 4: A little bit of dataset information is printed in this cell.
The output gives the number of observations, variables and the dataset description.
For our new dataset with variables ‘incomeperperson’, ‘urbanrate’, ‘employrate’, ‘co2emission’, and ‘relectricperperson’
count is the total number of observations in the variables
unique is the number of unique values in under each variable
top is the most mode of the unique values
freq is the number of times the top (mode) appears in the observations.
Frequency Distributions for Each Variable
1. Income per Person(incomeperperson) observations
The output shows the frequency distribution table for the ‘incomeperperson’ variable.
The variable is grouped by the 191 unique observations.
These values are continuous with most of the values occurring only once (Frequency = 1, Percentage = 0.469484%).
Missing data, denoted by an empty string is the mode of the distribution (Frequency = 23, Percentage = 10.798122%).
The Cumulative Frequency sums to the total observations (213) and Cumulative Percentage sums to 100%.
2. Urban Rate (urbanrate) observations
The output shows the frequency distribution table for the ‘urbanrate’ variable.
The variable is grouped by the 195 unique observations.
These values are continuous with most of the values occurring only once (Frequency = 1, Percentage = 0.469484%).
Missing data, denoted by an empty string is the mode of the distribution (Frequency = 10, Percentage = 4.694836%).
The Cumulative Frequency sums to the total observations (213) and Cumulative Percentage sums to 100%.
Frequency Distribution Table of'employrate', 'relectricperperson' can be viewed in the Part 2 of this post