Probability Redux

Statistics

Probability Redux

clodagh 2023. 2. 19. 01:48

Intro

Recall the statements of the (strong/weak) law of large numbers and the central limit theorem and know to apply these for large sample sizes.
(Optional:) Apply Hoeffding's inequality to the sample means of bounded i.i.d. random variables.
Recall the probability density function and properties of the Gaussian distribution .
Use Gaussian probability tables to obtain probabilities and quantiles .
Distinguish between convergence almost surely , convergence in probability and convergence in distribution , understand that these notions are from strongest to weakest.
Determine convergence of sums and products of sequences that converge almost surely or in probability.
Apply Slutsky's theorem to the sum and product of a sequence that converges in distribution and another that converges in probability to a constant.
Use the continuous mapping theorem to determine convergence of sequences of a function of random variables.

Probability

Probability is an essential part of statistics. On the one hand, the truth, a stochastic process, a data generating process, that is generating the observations that we see. On the other hand, we have just partial observations of it. Probability tells us how to, given the truth, what the observations would look like.

Two Important Probability Tools

: Average of random variables : LLT & CLT

Laws of Large Numbers (LLN, 큰 수의 법칙)

We replace expectation by averages. And the thing that justifies us from doing this, this thing that says that expectations are close to averages, or averages or close to expectations is this Law of Large Numbers. There are two versions of it.

There’s the weak version and the strong version. And what they essentially say is that if you take the average of random variables, $x_1$ to $x_n$ so last time we saw that we will typically denote this quantity by xn bar, then they're converging to this quantity mu, which is the expectation of each of those guys.

The difference between the weak version and the strong version is the sense in which this convergence happens.

Central Limit Theorem (CLT, 중심극한정리)

In particular, if I have my xn bar that converges to mu, as n goes to infinity, what's wrong with this? All right, so as n goes to infinity, it could be that xn bar minus mu is something like 1 over log of log of log of n. So, this is not very useful because it means that for a given n, like, let's say, n is 10 to the 10 to the 10, then this thing is basically 1 over 10, and this is not helping at all. This is extremely slow, but this cannot be ruled out by the Law of Large Numbers.

And this is essentially what's telling is the deviations of xn bar, the typical size of the deviations of xn bar around mu. So xn bar will be not exactly equal to mu, it will have some inherent distinction. And the central limit theorem tells us that if you take xn bar and remove mu, divide by sigma, which is the standard deviation of x, and then you multiply everything by square root of n, then this thing converges to a Gaussian.

But what's happening is, the important part the quantitative message is telling us is that, if I say that xn bar minus mu over sigma is about equal to some Gaussian, a standard Gaussian, if you draw a standard Gaussian with a high probability you're going to get a number which is between minus 3 and 3, with overwhelming probability, almost probability 1, which means that this quantity will be a number between minus 3 and 3, with overwhelming probability. Which means, that if I look now at xn bar minus mu, it's going to be less than 3 times sigma over square root of n, with overwhelming probability. So now this is giving us something which is much more interesting.

The role of the Gaussian will allow us to be extremely precise about the constant that we get. And the constant three did not appear in the conversation we just had.

Hoeffding’s Inequality

: Small sample size of bounded random variables: Hoeffding's Inequality

When n is not large enough, there is still something that we can say. There's something that we can say for any n. Even when n is equal to 2, we can actually say something. Of course, it's not going to be a very strong statement, but we can say something. So there's this result called Hoeffding's inequality.

Hoeffding's inequality is an extremely important quantity. If you ever write a paper on, say, theoretical machine learning, chances are, you're going to be using Hoeffding's inequality.

This is how it goes.

First of all, you have to make restrictions on the random variables that you're using. And I use the central limit theorem. And I basically assume that I had a mean and a variance. And that's all I needed. Here I actually need much more. I need that my random variables are actually almost surely bounded.

The conclusion is the average is a good replacement for the expectation. And now we not only notice that the law of large number guarantees that. But we also know that Hoeffding's Inequality guarantees that for all n that for all.

Consquences

The LLN’s tell us that

> modeling assuption : i.i.d, which means independent, identical distribution.

Hence, when the size n of the experiment becomes large, $\overline{R_n}$ is a good (say “consistent”) estimator of p.
The CLT refines this by quantifying how good this estimate is: for n large enough the distribution of $\hat{p}$ is almost:
$\mathbb{P}(|\overline{R_n} -p| \ge \epsilon ) \simeq$ $\mathbb{P} (|N(0,\frac{0}{p(1-p)/n}| \gt \epsilon|)$

Gaussian Distribution

Properties of the Gaussian distribution

Affine transformation, standardization, Symmetry

Affine transformation
- $a \bullet X + b \sim N(a\mu +b, a^2\sigma^2 )$
- $X \sim N(\mu, \sigma^2 )$ , then for any $a,b \in \mathbb{R}$,
Standardization (a.k.a Normalizatio / Z-score)
- If $X\sim N(\mu, \sigma^2),$
- then $Z =$ $(X-\mu) \over \sigma$$\sim N(0,1)$
- Useful to compute probabilities from CDF of $Z \sim N(0,1):$
- $P(u \le X \le v) = P(\frac{u-\mu}{\sigma} \le Z \le \frac{v-\mu}{\sigma} )$
Symmetry
- $X \sim N(0,\sigma^2)$ then $-X \sim N(0,\sigma^2 ) : if,x>0$,
- $P(|X| \gt x) = P(X \gt x) + P(X<-x)$ = $2*P(X>x)$

Modes of Convergence

: Convergence almost surely, in probability, and in distribution

Those three types of convergence that appeared-- there were two in the law of large numbers and one that showed up in the central limit theorem.

The first that you saw in the law of large numbers was almost surely convergence for the strong law of large number, and the convergence in probability. And the last one is the convergence in distribution that was in the central limit theorem.

Recap

Averages of random variables occur naturally in statistics
We make modeling assumptions to apply probability results
For large sample size they are consistent (LLN) and we know their distribution (CLT)
CLT gives the (weakest) convergence in distribution but is enough to compute probabilities
We use standardization and Gaussian tables to compute probabilities and quantiles
We can make operations (addition, multiplication, continuous functions) on sequences of random variables

저작자표시 비영리 변경금지 (새창열림)

'Statistics' 카테고리의 다른 글

BDA 1 (0)	2023.10.20
나아진다는 착각 (Why removing constants improves model performance) (0)	2023.05.30
Parametric Statistical Models (0)	2023.02.19
What is Statistics? (0)	2023.02.18

현재글Probability Redux

CLODAGH NOTE

앙상블모델, 파이썬공부, machine learning, 생성AI, til, AB테스트, 공간통계학, 파이썬, Statistics, openai, 내의견, 딥러닝, ChatGPT, Python, AI, 통계분석, 프롬프트엔지니어, 머신러닝, ABTesting, 통계,

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

CLODAGH NOTE