some rambling about asymptotics
part 5 of statistics (toc)
So we continue with the probability that is pretending to be statistics that is presented on a math blog. What is even happening?
Right off the bat, maybe I should say I think this post is going to just be all over the damn place. Honestly, at the time of writing, I have my midterm tomorrow, I'm not wearing pants, and I'm both full and hungry at the same time. Maybe you should reevaluate whether you should be reading a math blog written by a weirdo.
But assuming you've stuck around, let's pick up where we left off last time. In particular, we were talking about the convergence of a sequence of random variables. The conclusion was that convergence in distribution is the weakest notion of convergence out of all of the ones that I definition-dumped. There's a small exception, and that is when a sequence of random variables converges in distribution to a constant. A constant random variable is just a random variable that assigns some constant to every single event, and it isn't particularly interesting as a random variable. But what IS useful is that if a sequence converges in distribution to a constant random variable, then it also converges in probability to that constant random variable, too.
Turns out there's also this pretty useful result called the continuous mapping theorem that makes life 100% easier. It basically tells us that if a sequence {X1, X2, ... } of random variables converges in distribution to X, then the sequence {g(X1), g(X2), ... } converges in distribution to g(X) as long as g is continuous. Actually, g doesn't even need to be continuous everywhere. We just need the section D on which it is continuous to occur with probability 1 under the distribution of the limit X. You may have heard of Slutzky's theorem, which is basically an application of the continuous mapping theorem combined with the observation above.
The next two results are probably the most famous in probability, and they are the law of large numbers and (everybody's favourite) the CeNtRaL LiMiT tHeOrEm!!! These two results help us a lot with proper actual real statistics, where we try to estimate population parametres from a set of data. But I'm getting ahead of myself. I want to try to get through the statements of these results without much notation, but we'll see how I do.
The law of large numbers applies to a length-n sequence of random variables which each have some mean and variance. The catch is that every pair of random variables must have zero covariance; and that the average variance divided by n must tend towards 0. In this case, then, the average of the random variables converges almost surely to the average of their means. In practice, this result usually gets applied to some kind of data collection or sampling. With a random sample, the sequence of random variables is iid, and so the law of large numbers in that case simply states that the sample mean converges almost surely to the true population mean.
The central limit theorem is similar, but makes a statement about the distribution of this sample mean, m. So if we're given a length-n sequence of iid random variables, then (√n)(m - μ) converges in distribution to a normal distribution centred at 0 with variance σ2, where μ and σ are the population parametres. A more general statement of the theorem ditches the iid assumption by allowing the variance of each draw to differ. Then, the theorem is revised to reflect this change just by replacing the population variance (which doesn't exist any more) with the average of the variances.
So to recap (since these results can be kind of dense): the law of large numbers gives us the asymptotic value of the sample mean, while the central limit theorem gives us a description of how the values of the asymptotic sample mean vary.
As a side note, I should probably point out that the sample mean is a random variable itself. It's just a linear combination of n random variables (namely with weights 1/n for each one), and so you can construct a sequence of sample means by just recalculating the sample mean at each added data point. In fact, that's implicitly what's going on in the above two statements, since they both use language that describes the convergence of a sequence of random variables. The asymptotics, then, tell you what happens to the sample mean when the sample size is really big! In general, that's why we want big samples--because we get more information about the true population, and because we get closer to the asymptotic results that we know about.
It's hard to imagine any statistics class that doesn't mention the central limit theorem. I know I first saw it when I was around fifteen. But of course, I didn't really get it. Like... what was the point? And why did it even matter? But actually, this theorem is, as my metrics professor would say, basically the perfect theorem. It is totally applicable in a huge variety of settings, and it's also a somewhat surprising and non-obvious result. The full significance didn't really hit me in high school, and I'm not even sure it has now. But, over five years later, it's finally left a deeper impression. We actually went through a proof of the theorem in class, which was pretty cool and made the whole thing seem more legitimate. Maybe I'll get around to TeXing it later, since it would be a total nightmare to present otherwise.
The last little chunk of this post is the delta method, which in a way is analogous to the continuous mapping theorem. The delta method basically tells you what to do if you have some setup like the central limit theorem, but there has been a transformation applied. Words are hard sometimes, so let's just see it with notation. Given a sequence of random variables where (√n)(Xn - x) converges in distribution to a normal N(0, σ2), then applying a continuously differentiable transformation g to the sequence is no big deal! We just get that (√n)(g(Xn) - g(x)) converges in distribution to a normal N(0, (g'(x))2σ2). That's not too bad, right?
Of course, we still have one small problem: the central limit theorem makes use of the population variance in its description of the asymptotic distribution of the sample mean. A lot of the time, we don't have the variance of a population. Ah--well, just like we can use the sample values to guess at the population's true mean, we can use the sample values to guess at the population's true variance. Is that enough of a cliffhangery teaser? (Probably not...) In any case, I'll try to tackle estimation next time, and hopefully I'll be wearing pants too.












