Definition of Statistics, Probability, and Key Terms
The science of statistics deals with the collection, analysis, interpretation, and presentation of data.
Organizing and summarizing data is called descriptive statistics. Two ways to summarize data are by graphical methods and by using numbers and symbols. With knowledge on probability and probability distributions, formal methods, or inferential statistics, will be used to make conclusions from meaningful data. Statistical inference uses probability to determine how confident a researcher can be that their conclusions are correct.
Probability
Probability is a mathematical tool used to study randomness. Probability deals with the chance, or likelihood, of an event occurring.
For example, if a fair coin was tossed, the outcomes may not be two heads and two tails. However, if the coin was tossed 4000 times, the outcomes will be close to 2000 heads and 2000 tails.
The expected theoretical probability of getting heads in one coin toss is 1/2 or 0.50. Even though the outcomes of a few repetitions are uncertain, there is a regular pattern of outcomes when there are many repetitions.
For example, the English statistician Karl Pearson tossed a coin 24,000 times with a result of 12,012 heads, which equals a probability of getting heads after 24,000 trials 0.5005.
Population and Sample
In statistics, populations are generally studied. A population is a collection of persons, things, or objects under study. To study the population, a sample of the population is selected. The idea of sampling is to select a portion, or sample, to gain information about the population. Data are the result of sampling from a population. Because it is often difficult to examine an entire population, sampling is a practical technique.
Statistics and Parameters
From the sample data, a statistic can be calculated. A statistic is a number that represents a measure of the sample. For example, the average number of points earned by the students in one math class at the end of the term is a statistic. The statistic is an estimate of a population parameter. A parameter is a number that represents a measure of the population. If all math classes in a school was the population, then the average number of points earned by the students in every math class at the end of the term is a parameter.
Representative Sample
One of the main concerns in the field of statistics is how accurately a statistic estimates a parameter. This accuracy depends on how well the sample represents the population. The sample must contain the characteristics of the population to be a representative sample.
Variables
A variable, denoted by capital letters such as X or Y, is a characteristic of interest for each unit in a population. Variables may be numerical or categorical. Numerical variables take on values that have equal units, such as weight in pounds or time in hours. Categorical variables place the object into a category.
For example, if X equals the number of points earned by one math student at the end of the term, then X is a numerical variable. If X is equal to a person's party affiliation, then some examples of X include Republican, Democrat, and Independent, and so X is then a categorical variable.
Mathematical calculations can be done to numerical variables, however mathematical calculations cannot be done to categorical variables.
Data
Data are the actual values of the variable. The data can be in numbers or in words, where datum means a single value.
Mean and Proportion
The words "mean" and "average" are often used interchangeably. If three exams in a math class were taken by a student, with each exam having the scores 86, 75, and 92, the mean score is calculated by adding the three exam scores together and dividing this sum by the total number of exams.
In regards to proportion, if there were 40 students in a math class, and 22 of them were boys and 18 of them were girls, then the proportion of male students in that class is 22/40, and the proportion of female students in that class is 18/40.