Central Limit Theorem: Proof by Simulation

CARivero

February 2, 2019

Central Limit Theorem

The Central Limit Theorem states that given samples of independent and identically distributed random variables from a population distribution, and given that the samples are sufficiently large with size \(n\), then the sampling distribution of the sample mean will have the following properties:

Distribution
Even if the population distribution is non-Gaussian, the distribution of the sample mean will be approximately normal.
Center
The center of the distribution of the sample mean \(\bar x\) will be approximately equal to the population mean \(\mu\).
Spread
The standard error of the distribution of the sample mean (SEM) will be approximately equal to the ratio of the population standard error \(\sigma\) and square root of the sample size. \(SEM = \frac{\sigma}{\sqrt{n}}\)

Interactive Proof!
The Shiny application will allow the user to prove this theorem by simulation of exponentials, with user-provided rate \(lambda\) and sample size \(n\). A histogram will then be returned to show the shape of the distribution of sample means, along with the values of the theoretical and simulated means and standard errors.

Shiny App Interface

Given the default input values \(\lambda = 0.2\) and \(n = 40\), the Shiny app will return the following output, among others:

Statistics	Mean	SEM
Theoretical (CLT)	5	0.791
Simulated	5.011	0.815

Shiny App Input and Output

Input Variables
- Rate Parameter, \(\lambda\) (sidebar) - lambda, default = 0.2
- Sample Size, \(n\) (sidebar) - n, default = 40
Text Output Variables: Theoretical
- Population Mean, \(\mu\) (sidebar) - pop.mean
- Population Standard Deviation, \(\sigma\) (sidebar) - pop.sd
- CLT Mean, Theoretical \(\bar x\) (sidebar) - CLT.mean
- CLT SEM, Theoretical \(SEM\) (sidebar) - CLT.SEM
Text Output Variables: Simulated
- Sample Mean, \(\bar x\) (tab 2) - samp.mean
- Sample Standard Deviation, \(s\) (tab 2) - samp.sd
- Mean of Sample Means, Simulated \(\bar x\) (tab 3) - sampmean.mean
- Standard Error of Sample Means, Simulated \(SEM\) (tab 3) - sampmean.SEM
Plot Output and Checkbox Input
- Sampling Distribution of Exponentials (tab 2)
  - Histogram
  - Sample Mean (optional) - shows abline at the mean of simulated exponentials
  - Population Density Curve (optional) - shows density curve of the exponential populaton at given \(\lambda\)
- Sampling Distribution of Mean of Exponentials (tab 3)
  - Histogram
  - Simulated Mean (optional) - shows abline at the mean of simulated mean of exponentials
  - Simulated Density Curve (optional) - shows density curve of the simulated means of exponentials
  - Theoretical Density Curve (optional) - shows the normal curve per CLT-predicted \(\mu\) and \(SEM\)

Background Calculations

## Define INPUT variables using defaults
lambda <- 0.2; n <- 40

## Calculate OUTPUT variables (theoretical statistics) in sidebar
Theoretical <- list()
Theoretical$pop.mean = round(1/lambda,3); Theoretical$pop.sd = round(1/lambda,3)
Theoretical$CLT.mean = Theoretical$pop.mean; Theoretical$CLT.SEM = round(Theoretical$pop.sd/sqrt(n), 3)

## Calculate OUTPUT variables (Simulated statistics) in tabs
set.seed(3049);
sample <- matrix(rexp(1000*n, lambda), nrow = 1000, ncol = n)
means <- apply(sample, MARGIN = 1, FUN = mean)
Simulated <- list()
Simulated$samp.mean = round(mean(sample),3); Simulated$samp.sd = round(sd(sample),3)
Simulated$sampmean.mean = round(mean(means),3); Simulated$sampmean.SEM = round(sd(means),3)

unlist(Theoretical); unlist(Simulated)

pop.mean   pop.sd CLT.mean  CLT.SEM 
   5.000    5.000    5.000    0.791

    samp.mean       samp.sd sampmean.mean  sampmean.SEM 
        5.011         5.017         5.011         0.815