Intro
- Recall the statements of the (strong/weak) law of large numbers and the central limit theorem and know to apply these for large sample sizes.
- (Optional:) Apply Hoeffding's inequality to the sample means of bounded i.i.d. random variables.
- Recall the probability density function and properties of the Gaussian distribution .
- Use Gaussian probability tables to obtain probabilities and quantiles .
- Distinguish between convergence almost surely , convergence in probability and convergence in distribution , understand that these notions are from strongest to weakest.
- Determine convergence of sums and products of sequences that converge almost surely or in probability.
- Apply Slutsky's theorem to the sum and product of a sequence that converges in distribution and another that converges in probability to a constant.
- Use the continuous mapping theorem to determine convergence of sequences of a function of random variables.
Probability
Probability is an essential part of statistics. On the one hand, the truth, a stochastic process, a data generating process, that is generating the observations that we see. On the other hand, we have just partial observations of it. Probability tells us how to, given the truth, what the observations would look like.
Two Important Probability Tools
: Average of random variables : LLT & CLT
Laws of Large Numbers (LLN, 큰 수의 법칙)
We replace expectation by averages. And the thing that justifies us from doing this, this thing that says that expectations are close to averages, or averages or close to expectations is this Law of Large Numbers. There are two versions of it.
There’s the weak version and the strong version. And what they essentially say is that if you take the average of random variables, $x_1$ to $x_n$ so last time we saw that we will typically denote this quantity by xn bar, then they're converging to this quantity mu, which is the expectation of each of those guys.
The difference between the weak version and the strong version is the sense in which this convergence happens.
Central Limit Theorem (CLT, 중심극한정리)
In particular, if I have my xn bar that converges to mu, as n goes to infinity, what's wrong with this? All right, so as n goes to infinity, it could be that xn bar minus mu is something like 1 over log of log of log of n. So, this is not very useful because it means that for a given n, like, let's say, n is 10 to the 10 to the 10, then this thing is basically 1 over 10, and this is not helping at all. This is extremely slow, but this cannot be ruled out by the Law of Large Numbers.
And this is essentially what's telling is the deviations of xn bar, the typical size of the deviations of xn bar around mu. So xn bar will be not exactly equal to mu, it will have some inherent distinction. And the central limit theorem tells us that if you take xn bar and remove mu, divide by sigma, which is the standard deviation of x, and then you multiply everything by square root of n, then this thing converges to a Gaussian.
But what's happening is, the important part the quantitative message is telling us is that, if I say that xn bar minus mu over sigma is about equal to some Gaussian, a standard Gaussian, if you draw a standard Gaussian with a high probability you're going to get a number which is between minus 3 and 3, with overwhelming probability, almost probability 1, which means that this quantity will be a number between minus 3 and 3, with overwhelming probability. Which means, that if I look now at xn bar minus mu, it's going to be less than 3 times sigma over square root of n, with overwhelming probability. So now this is giving us something which is much more interesting.
The role of the Gaussian will allow us to be extremely precise about the constant that we get. And the constant three did not appear in the conversation we just had.
Hoeffding’s Inequality
: Small sample size of bounded random variables: Hoeffding's Inequality
When n is not large enough, there is still something that we can say. There's something that we can say for any n. Even when n is equal to 2, we can actually say something. Of course, it's not going to be a very strong statement, but we can say something. So there's this result called Hoeffding's inequality.
Hoeffding's inequality is an extremely important quantity. If you ever write a paper on, say, theoretical machine learning, chances are, you're going to be using Hoeffding's inequality.
This is how it goes.
First of all, you have to make restrictions on the random variables that you're using. And I use the central limit theorem. And I basically assume that I had a mean and a variance. And that's all I needed. Here I actually need much more. I need that my random variables are actually almost surely bounded.
The conclusion is the average is a good replacement for the expectation. And now we not only notice that the law of large number guarantees that. But we also know that Hoeffding's Inequality guarantees that for all n that for all.
Consquences
- The LLN’s tell us that
> modeling assuption : i.i.d, which means independent, identical distribution.
- Hence, when the size n of the experiment becomes large, $\overline{R_n}$ is a good (say “consistent”) estimator of p.
- The CLT refines this by quantifying how good this estimate is: for n large enough the distribution of $\hat{p}$ is almost:
- $\mathbb{P}(|\overline{R_n} -p| \ge \epsilon ) \simeq$ $\mathbb{P} (|N(0,\frac{0}{p(1-p)/n}| \gt \epsilon|)$
Gaussian Distribution
Properties of the Gaussian distribution
Affine transformation, standardization, Symmetry
- Affine transformation
- $a \bullet X + b \sim N(a\mu +b, a^2\sigma^2 )$
- $X \sim N(\mu, \sigma^2 )$ , then for any $a,b \in \mathbb{R}$,
- Standardization (a.k.a Normalizatio / Z-score)
- If $X\sim N(\mu, \sigma^2),$
- then $Z =$ $(X-\mu) \over \sigma$$\sim N(0,1)$
- Useful to compute probabilities from CDF of $Z \sim N(0,1):$
- $P(u \le X \le v) = P(\frac{u-\mu}{\sigma} \le Z \le \frac{v-\mu}{\sigma} )$
- Symmetry
- $X \sim N(0,\sigma^2)$ then $-X \sim N(0,\sigma^2 ) : if,x>0$,
- $P(|X| \gt x) = P(X \gt x) + P(X<-x)$ = $2*P(X>x)$
Modes of Convergence
: Convergence almost surely, in probability, and in distribution
Those three types of convergence that appeared-- there were two in the law of large numbers and one that showed up in the central limit theorem.
The first that you saw in the law of large numbers was almost surely convergence for the strong law of large number, and the convergence in probability. And the last one is the convergence in distribution that was in the central limit theorem.
Recap
- Averages of random variables occur naturally in statistics
- We make modeling assumptions to apply probability results
- For large sample size they are consistent (LLN) and we know their distribution (CLT)
- CLT gives the (weakest) convergence in distribution but is enough to compute probabilities
- We use standardization and Gaussian tables to compute probabilities and quantiles
- We can make operations (addition, multiplication, continuous functions) on sequences of random variables
'Statistics' 카테고리의 다른 글
BDA 1 (0) | 2023.10.20 |
---|---|
나아진다는 착각 (Why removing constants improves model performance) (0) | 2023.05.30 |
Parametric Statistical Models (0) | 2023.02.19 |
What is Statistics? (0) | 2023.02.18 |