Statistics/A&B Testing

Bayes Rule and Probability Review (2)

clodagh 2023. 2. 18. 16:26

CDFs and Percentiles

CDF(cumulative Distribution Function)

  • CDF
    • X = random variable
    • $F(x) = P(X\le x)$, $F(x) = P(X\le x ) = \int_{-\infty}^{x} f(t)dt$
  • PDF
    • $f(x) = dF(x) / dx$
  • CDF Shape
    • is aways 0 at -infinity and 1 at infinity
    • non-decreasing(since p(x) is non-negative)

Inverse CDF

  • Percent point function (percentile)
  • F(X) = probability that X is less than or equal to x
  • F-1(p) = what value of x would yield a CDF value of p?
    • What is the probability a student will achieve 170 or less?
      • It’s the CDF : norm.cdf(170,mu,sigma)
      • Let’s say it’s 95%
    • What’s the probability of scoring above 170?
      • 1-norm.cdf(170,mu,sigma) = 5%
    • What’s the maximum score of the bottom 95% of the class?
      • norm.ppf(0.95, mu, sigma) = 170
      • “in the 95th percentile” means you’re doing better than the bottom 95%

 

Example

  • Suppose average height = 170, sd = 7 cm
  • “My height is in the 95th percentile”
    • F-1(0.95, 170, 7) = 181.5cm
  • 160cm tall, what percentile are they in ?
    • F(x≤160) = F(z≤-10/7) = F(160, 170, 7) = 0.08

 

CDF and Inverse CDF in code

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
np.random.seed(0)

mu = 170
sd = 7

# generate samples from our distribution
x = norm.rvs(loc = mu, scale = sd, size = 100) # loc = location

x

# maximum likelihood mean
x.mean()

# maximum likelihood variance
x.var()

# manually check variance
((x-x.mean())**2).mean() # 편차제곱평균 = 분산

# maximum likliehood std
x.std()

# unbiased variance
x.var(ddof=1) # delta degrees of freedom, this means we devide by N-1

# manually calculate unbiased variance
((x-x.mean())**2).sum()/(len(x)-1)

# unbiased std
x.std(ddof=1)

# at what height are you in the 95th percentile?
norm.ppf(0.95, loc=mu, scale = sd)

# think about a picture , distribution

# you are 160cm tall, what percentile are you in ?
norm.cdf(160, loc=mu, scale = sd)

# you are 180 cm tall, what is the probability that someone is taller than you?
1-norm.cdf(180, loc=mu, scale = sd)

norm.sf(180, loc=mu, scale = sd)

 

 

Summary

  • Bayesian ML > Bayes’Rule > Probability
  • There’s no “scikit-learn” for Bayesian ML, so just type three line codes
  • Often you’ll build a custom model (by definition, a library won’t exist!!)