Discover Top Posts Tagged with #probability distribution

Popular Recent

Ayoo u all should take this uquiz and get sould read of what kind of probability distribution you are! i know links can mess with how posts are shown so ill add it in the replies! i got normal distribution :)

#uquiz #personality quiz #data science #probability distribution

Uncertainty Wednesday: Entropy

Today in Uncertainty Wednesday I want to start on the idea that just the shape of the probability distribution alone contains some measure of uncertainty. Let’s think about the simplest case again with just two states A and B and P(A) = p1 with P(B) = p2 = 1- p1. I am using indices because we will shortly expand the number of states.

If p1 = 1 there is no uncertainty at all because you are certain that the world is in state A. Same holds true for p1 = 0 except that it is now state B (because now p2 = 1). If we let p1 move continuously from 1 to 0, uncertainty first increases and then it starts to decrease again. As we will see later but should be intuitive, uncertainty is maximized for p1 = p2 = 0.5.

More generally for a probability distribution p1, p2, ... pn we would like to have our measure of uncertainty such that

U(p1, p2, ... pn) is continuous in p1, p2, ... pn

Meaning if you make an infinitesimally small change in two of the p’s (remember, you can’t just change one of them because they all need to add up to 1), you get only an infinitesimally small change in U.

Now compare the following two situations to each other. You can either face two equally likely states A and B or three equally likely states A, B, C. It seems intuitive that we would say that there is more uncertainty when there are three equally likely states, even if we know nothing else. This requirement can be expressed as follows (with some abuse of notation):

f(n) := U(1/n, 1/n ... 1/n) is monotonically increasing in n

Finally, a good measure of uncertainty will have a straightforward approach to composability, meaning if you first face one uncertainty and then a second one it should be easy to combine the uncertainty measure for each to get an overall uncertainty.

To make this more concrete imagine the following setup. There are four states of the world A, B, C and D. Now imagine that the true state of the world is revealed to you in two stages: first you find out if is in {A, B} or {C, D} and then you find out the actual state. Let’s call the fist step X and the second step Y. Then we would like our measure of uncertainty to behave as follows

U(XY) = U(X) + ∑P(Xi) * U(Y|Xi) where ∑ is over the elements of X

and where U(Y|Xi) is the measure of remaining uncertainty conditional on the outcome of the first step. What this requirement amounts to is saying that the total uncertainty is a probability weighted sum of the uncertainties of each step (the first step having probability 1).

In his groundbreaking 1942 paper “A Mathematical Theory of Computation” Claude Shannon showed that the only measure of uncertainty that fulfills all three of these requirements is

H = - K ∑ pi log pi where ∑ is over the i = 1...n of the probability distribution

which is known as the Shannon entropy or just entropy of the probability distribution.

It is important to emphasize again that this is a measure of uncertainty that operates solely at the level of the probability distribution. Nothing in its definition refers to outcomes or even further to the impacts of outcomes on different actors. See the Intro to Measuring Uncertainty from two weeks ago for an explanation of the difference.

Next week we will look at entropy for some probability distributions to get more of a feel for what this measure captures.

#uncertainty wednesday #probability distribution #entropy #Shannon

Uncertainty Wednesday: Probability Distribution

So far in Uncertainty Wednesday we have limited ourselves to looking at examples with only two states of the world and two possible signal values. When I introduced this I explained that these combine to form four elementary events and we looked at the basic requirements for assigning probabilities to these and the axioms that probability should then follow.

Now let’s forget for a moment about the origin of our elementary events and simply look at any set S = {A, B, C, D, E, F} where the members are elementary events, or states of the world, or signal values. Now a probability distribution across the set S, is simply a set of values such that for x ∈ S

0 ≤ P(x) ≤ 1

and

∑P(x) = 1 where the sum is over all x ∈ S

Quite clearly there are infinitely many probability distributions that are possible (this was also already true in the case where S has only two elements). But how many P(x) can be chosen “freely”? Well, if |S| = n, meaning the set S has n members, then we get to choose n - 1 probabilities and the last one is automatically determined by the requirement that they all sum up to 1. In the case of n = 2 there is only one free parameter, i.e. is S = {A, B} and P(A) = p, then automatically P(B) = 1 - p.

So take for a moment the case of |S| = 1000, well then there are 999 probabilities that could be different individually and only the last one is then determined. We can think of this as a 999-dimensional space. I have intentionally chosen letters as the elements of S and even that is suggesting too much structure for the most general version (because we think of the alphabet as ordered).

Why am I emphasizing this? First, because most probability distributions that we work with all the time, such as the normal distribution impose dramatic constraints. For starters, these distributions require an ordering of the state space (meaning an ordering of the elements of S). And then they collapse the number of free dimensions dramatically. In the case of a normal distribution, for example, we will see that there are only 2 parameters (the mean and the standard deviation). So in the case of |S| = 1000 just discussed, by imposing a distribution that is approximately normal, we reduce a 999-dimensional space to a 2-dimensional one!

Second, because it is possible to make some important arguments about uncertainty solely on the basis of the shape of the probability distribution without a reference to values associated with each element of S. Coming back to the simplest possible case of |S| = 2, there is a difference in uncertainty between P(A) = 0.8 and P(A) = 0.5 (and hence P(B) = 0.2 and 0.5 respectively). In a very precise way there is more uncertainty when P(A) = 0.5 than P(A) = 0.8 without saying anything about what happens in each state or assigning a numeric value to the states.

In the coming Wednesdays we will dig deeper into both of these important points, probably starting with the second one (although I haven’t quite made up my mind about that).

#uncertainty wednesday #probability distribution

The outcomes of (1) the z score of Normal distribution; (2) the parameters of Normal distribution are not affected by the sample size.

#central limit theorem #probability distribution simulator #probability distribution #probability #statistics #random sample #sampling distribution #sample mean

The research methods explained

#central limit theorem #probability distribution simulator #probability distribution #probability #statistics #sampling distribution #research methodology

Explain the factors, skewed coefficient and kurtosis coefficient, from the probability distribution determine the minimal sample size.

The skewed coefficient and kurtosis coefficient might be affected by the parameter of the probability distribution.

#central limit theorem #probability distribution simulator #probability distribution #probability #sampling distribution #statistics #sample mean #skewed coefficient #kurtosis coefficient #sample size

Introduce the central limit theorem

#central limit theorem #probability distribution simulator #probability distribution #probability #statistics #sampling distribution

The minimal sample size of the specific probability distribution.

#central limit theorem #probability distribution simulator #probability distribution #probability #statistics #sampling distribution #sample size #random sample