Statistics/A&B Testing
Bayes Rule and Probability Review (2)
clodagh
2023. 2. 18. 16:26
CDFs and Percentiles
CDF(cumulative Distribution Function)
- CDF
- X = random variable
- $F(x) = P(X\le x)$, $F(x) = P(X\le x ) = \int_{-\infty}^{x} f(t)dt$
- PDF
- $f(x) = dF(x) / dx$
- CDF Shape
- is aways 0 at -infinity and 1 at infinity
- non-decreasing(since p(x) is non-negative)
Inverse CDF
- Percent point function (percentile)
- F(X) = probability that X is less than or equal to x
- F-1(p) = what value of x would yield a CDF value of p?
- What is the probability a student will achieve 170 or less?
- It’s the CDF : norm.cdf(170,mu,sigma)
- Let’s say it’s 95%
- What’s the probability of scoring above 170?
- 1-norm.cdf(170,mu,sigma) = 5%
- What’s the maximum score of the bottom 95% of the class?
- norm.ppf(0.95, mu, sigma) = 170
- “in the 95th percentile” means you’re doing better than the bottom 95%
- What is the probability a student will achieve 170 or less?
Example
- Suppose average height = 170, sd = 7 cm
- “My height is in the 95th percentile”
- F-1(0.95, 170, 7) = 181.5cm
- 160cm tall, what percentile are they in ?
- F(x≤160) = F(z≤-10/7) = F(160, 170, 7) = 0.08
CDF and Inverse CDF in code
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
np.random.seed(0)
mu = 170
sd = 7
# generate samples from our distribution
x = norm.rvs(loc = mu, scale = sd, size = 100) # loc = location
x
# maximum likelihood mean
x.mean()
# maximum likelihood variance
x.var()
# manually check variance
((x-x.mean())**2).mean() # 편차제곱평균 = 분산
# maximum likliehood std
x.std()
# unbiased variance
x.var(ddof=1) # delta degrees of freedom, this means we devide by N-1
# manually calculate unbiased variance
((x-x.mean())**2).sum()/(len(x)-1)
# unbiased std
x.std(ddof=1)
# at what height are you in the 95th percentile?
norm.ppf(0.95, loc=mu, scale = sd)
# think about a picture , distribution
# you are 160cm tall, what percentile are you in ?
norm.cdf(160, loc=mu, scale = sd)
# you are 180 cm tall, what is the probability that someone is taller than you?
1-norm.cdf(180, loc=mu, scale = sd)
norm.sf(180, loc=mu, scale = sd)
Summary
- Bayesian ML > Bayes’Rule > Probability
- There’s no “scikit-learn” for Bayesian ML, so just type three line codes
- Often you’ll build a custom model (by definition, a library won’t exist!!)