The central limit theorem is one of the most fundamental and widely applicable theorems in probability theory. It describes how in many situation, sums or averages of a large number of random variables are approximately normally distributed. In its classical form, the central limit theorem states that the average or sum of independent and identically distributed random variables becomes an approximate normal distribution as the number of variables increases. The central limit theorem has many variants, some of which allow the variables to have varying distributions or to have a limited amount of dependence on each other.

The central limit theorem has many applications in statistics. For example it implies that the average of a large number of independent samples from any random distribution is an approximate normal distribution centered around the mean of the sample distribution with a variance equal to the variance of the sample distribution divided by the number of samples. This can be used to estimate a confidence interval for the mean of the sample distribution.

It is also very useful in science and mathematical modelling. For example, the simulation on this page shows the central limit theorem in action by simulating particles falling randomly through 100 layers of obstacles and pilling up in stacks at the bottom. As the particles accumulate in stacks, an approximate bell curve emerges due to the central limit theorem. This is because the final position of the particles is based on the sum of the left/right turns they took when falling through the obstacles, and that every left/right turn a particle took when falling down is independent and identically distributed from every other turn it took. Furthermore, the particles are independent from each other because particles can pass through each other without colliding. Of course, since the simulation is random, the bell shape formed by the particle stacks is not perfect and varies from one run of the simulation to the next.

Contents

Mathematical Explanation of the Simulation
Other formulations of the Classical Central Limit Theorem
Proof of the Classical Central Limit Theorem

Mathematical Explanation of the Simulation

Every particle falls through 100 layers of triangular obstacles. At each obstacle layer, particles have a 50% probability making a left turn and a 50% probability making a right turn. After going through the 100 layers of obstacles, the particles accumulate in stacks at the bottom. The final position of every particle with respect to the center is equal to the number of right turns it took minus the number of left turns it took. For example:

If a particle took 50 right turns and 50 left turns, it will end up in the central stack because \$50-50=0\$ stacks from the center.
If a particle took 51 right turns and 49 left turns, it will end up 1 stack to the right of the center because \$51-49=+1\$ stacks to the right of the center.
If a particle took 48 right turns and 52 left turns, it will end up 2 stack to the left of the center because (\$48-52=-2\$) so a position of \$-2\$ from the center.

Particles can only collide with the triangular obstacles, they can't collide with each other when they are falling through the obstacles.

More formally, every turn of every particle is a random variable with 50% probability of representing a right turn and having a value of \$+1\$ and 50% probability of representing a left turn and having a value of \$-1\$: $$ T_{ij} = \begin{cases} -1, \text{ 50%, particle j turns left at ith obstacle} \\ +1, \text{ 50%, particle j turns right at ith obstacle } \end{cases} $$ The turn random variables have the following key properties:

They are independent from each other. A particle always has a 50/50 probability of going left or right regardless of which direction it or other particles have turned in the past (or will in the future).
They are identically distributed from each other. A particle always has a 50/50 probability of going left or right. This was explicitly encoded into the simulation.
They have a finite mean and variance. They can only have a value of \$+1\$ or \$-1\$ so they obviously can't have an infinite mean or variance.

The final position from the center of the jth particle (refered to as \$P_j\$) is also a random variable which is equal to sum of the turn variables: $$P_j = T_{1j} + T_{2j} + T_{3j} + ... + T_{100j} = \sum^{100}_{i=1} T_{ij}$$

The Classical Central Theorem implies that the sum of a large number of random variables which are independent from each other, and identically distributed with finite mean and variance is approximately normally distributed. The turn variables here are independent and identically distributed with finite mean and variance and the final position of every particle is the sum of a relatively large number of those turn variable. Therefore, based on the Classical Central Limit Theorem, the final position of particles should be approximately normally distributed.

More formally, the Central Limit Theorem in its summation form says :

Classical Central Limit Theorem - Summation Form

If \$X_1, X_2, X_3, ..., X_n\$ are independent identically distributed variables (i.i.d) from a distribution with finite expected value \$\mu\$ and variance \$\sigma^2\$ (with the number of samples equal to \$n\$). Let their sum \$S_n\$ be: $$ S_n = X_1 + X_2 + X_3 + ... + X_n = \sum^n_{i=1}X_i $$ then when the number of samples \$n\$ is large enough $$S_n \approx \text{Normal Distribution}(n \times \mu, n \times \sigma^2)$$ In other words, when the number of samples is large enough, the sum of the samples becomes approximately normally distributed with a mean equal to the sum of the individual sample means and the variance equal to the sum of their variances.

The number of samples \$n\$ required for a good approximation depends on the specific sampling distribution and on how a "good approximation" is defined. That being said, as a general rule of thumb, a number of samples \$n \ge 30\$ is often enough for a good approximation close to the mean (\$n \times \mu\$) of the distribution.

Estimating the theoretical Particle Mean and Variance using the Central Limit Theorem

The variables being added up are the \$T_{ij}\$'s
- They have finite mean \$\mu_{T_{ij}} = 0\$
  - Since \$ \mu_{T_{ij}} = E[T_{ij}] = 0.5\times-1 + 0.5\times 1 = 0\$
- They have finite variance \$\sigma_{T_{ij}}^2 = 1 \$
  - Since \$\sigma_{T_{ij}}^2 = E[(T_{ij}-\mu_{T_{ij}})^2]\$ \$ = 0.5 \times (-1 -0)^2 + 0.5 \times (+1-0)^2 \$\$= 0.5 \times 1 + 0.5 \times 1 = 1 \$
- They are independent and identially distributed
The number of obstacle layers is \$n=100\$.

Therefore: $$P_j = T_{1j} + T_{2j} + T_{3j} + ... + T_{100j} = \sum^{100}_{i=1} T_{ij} $$ $$P_j \approx \text{Normal Distribution}(n \times \mu_{T{ij}}, n \times \sigma_{T{ij}}^2) $$

So the final particle position \$P_j\$ is distributed approximately normally with mean \$ \mu_{P_j}= n \times \mu_{T{ij}}= 0\$, variance \$ \sigma^2_{P_j} = n \times \sigma_{T{ij}}^2 = 100\$ and the standard deviation \$ \sigma_{P_j}=\sqrt{\sigma^2_{P_j}} = \sqrt{n} = \sqrt{100} = 10 \$.

This is the mean and variance of the theoretical distribution that approximates the final position of particles based on the Central Limit Theorem. It's important to understand that this is not the actual particle distribution - only an approximation. The distribution is close enough to a normal distribution to generate a bell curve pattern in the stacks but it is also very different in some ways. For example, the probability of a particle ending up far from the center is much smaller than it would be if its position was normally distributed with \$\mu=0\$ and \$\sigma=10\$. Furthermore, the normal distribution is a continuous distribution that has a positive probability density for any real number, whereas the particle distribution described here is a discrete distribution that has a non-zero probability on a finite number of the positions.

Other formulations of the Classical Central Limit Theorem

The Classical Central Limit Theorem can also be expessed in different forms. For example, we can compute the average of the random variables by dividing their sum by the number of random variables. We then get the Classical Central Limit Theorem in its average form:

Classical Central Limit Theorem - Average Form

If \$X_1, X_2, X_3, ..., X_n\$ are independent identically distributed variables (i.i.d) from a distribution with expected value \$\mu\$ and finite variance \$\sigma^2\$ with the number of samples equal to \$n\$. Let: $$ \bar{X}_n = \frac{X_1 + X_2 + X_3 + ... + X_n}{n} = \frac{\sum^n_{i=1}X_i}{n} $$ then when the number of samples \$N\$ is large enough $$\bar{X}_n \approx \text{Normal Distribution}(\mu, \sigma^2/n)$$

The number of samples \$n\$ required for a good approximation depends on the specific sampling distribution and on how we define "good approximation". That being said, as a general rule of thumb, a number of samples \$n \ge 30\$ is often enough for a good approximation near to the mean of the distribution (around \$\mu\$).

The Classical Central Limit Theorem can also be written in its Standardized form:

Classical Central Limit Theorem - Standardized Form

If \$X_1, X_2, X_3, ..., X_n\$ are independent identically distributed variables (i.i.d) from a distribution with expected value \$\mu\$ and finite variance \$\sigma^2\$ with the number of samples equal to \$n\$. If: $$ S_n = X_1 + X_2 + X_3 + ... + X_n =\sum^n_{i=1}X_i $$ and $$ Z_n = \frac{S_n - n\times\mu}{\sigma \sqrt{n}} $$ then when the number of samples \$N\$ is large enough $$Z_n \approx \text{Normal Distribution}(0, 1)$$

Equivalence of all 3 formulations

The key fact here is that if \$Z \sim N(0,1)\$, then \$aZ+b \sim N(b, a^2)\$ for \$a,b \in R\$. Using that fact it's easy to show that the summation and average forms follow from the standardized form.

Standardized Form \$\iff\$ Summation Form
Let \$Z \sim N(0,1) \$ be a standard normal variable and \$Z_n = \frac{\sum_{i=1}^n X_i- n \times \mu}{\sqrt{n} \times \sigma}= \frac{S_n- n \times \mu}{\sqrt{n} \times \sigma}\$ have an approximate standard normal distribution since \$n\$ is large. Then based on the central limit theorem in its standardized form: $$ Z_n \approx Z$$ $$ \iff$$ $$ \frac{S_n- n \times \mu}{\sqrt{n} \times \sigma} \approx Z$$ $$ \iff$$ $$ S_n \approx (\sqrt{n} \times \sigma ) Z + n \times \mu$$ Using the fact that if \$Z \sim N(0,1)\$ then \$aZ+b \sim N(b, a^2)\$ for \$a,b \in R\$: $$ \iff$$ $$ S_n \approx \text{Normal Distribution}(n \times \mu, n \times \sigma^2)$$ which is the central limit theorem in summation form.

Standardized Form \$\iff\$ Average Form
Let \$Z \sim N(0,1) \$ be a standard normal variable and \$Z_n = \frac{\sum_{i=1}^n X_i- n \times \mu}{\sqrt{n} \times \sigma}\$ have an approximate standard normal distribution since \$n\$ is large. Then based on the central limit theorem in its standardized form: $$ Z_n \approx Z$$ $$ \iff$$ $$ \frac{\sum_{i=1}^n X_i- n \times \mu}{\sqrt{n} \times \sigma} \approx Z$$ $$ \iff$$ $$ \frac{\frac{\sum_{i=1}^n X_i}{n} - \mu}{ \sigma/\sqrt{n}} \approx Z$$ $$ \iff$$ $$ \frac{\sum_{i=1}^n X_i}{n} \approx (\sigma/\sqrt{n}) \times Z + \mu$$ Using the fact that if \$Z \sim N(0,1)\$ then \$aZ+b \sim N(b, a^2)\$ for \$a,b \in R\$: $$ \iff$$ $$ \frac{\sum_{i=1}^n X_i}{n} \approx N(\mu, \sigma^2/n)$$ which is the central limit theorem in average form.

Proof of the Classical Central Limit Theorem

Prerequisites

There are multiple different types of convergence for random variables but the one that relates to the central limit theorem is convergence in distribution:

Convergence in Distribution

A sequence of real valued random variables \$X_1, X_2, X_3, ...\$ converges in distribution to a random variable \$X\$, denoted as \$X_n \overset{D}{\to} X\$, if: $$\lim_{n\to\infty} F_n(x)=F(x) \ \ \ \ \ \ \ \forall x \in R$$ Where \$F_n\$ and \$F\$ denote the cummulative distribution function of \$X_n\$ and \$X\$ respectively.

In other words, the cumulative distribution function of the \$X_n\$ variable converges poitwise to the cummulative distribution function of X.

Sequences of functions also have different types of convergence. The type of function sequence convergence used in the proof is pointwise convergence:

Pointwise Convergence for Sequences of Functions

A sequence of functions \$f_n(t)_{n=1}^{\infty}\$ converges pointwise to a function \$f(t)\$ on a domain \$D\$ if $$\lim_{n \to \infty} f_n(t) = f(t) \ \ \ \forall t \in D$$

This means that for any \$\epsilon > 0\$ and for any \$t \in D\$, there exists an integer \$N\$ such that \$\forall n \ge N\$, \$|f_n(t) - f(t)| \lt \epsilon \$. On other function, the \$f_n(t)\$ gets arbitrarily close to \$f(t)\$ as \$ n \to \infty \$ at every point \$t\$ in the domain \$D\$.

Another key concept is Characteristic Functions, which are complex valued functions that fully can fully define any real valued probability distribution. The formal definition of characteristic functions for any random variable is:

Characteristic Function

The Characteristic Function of a random variable \$X\$ is defined as: $$\varphi_X(t) = E[e^{itX}]$$ and completely defines the probability function of \$X\$.

For example, the Characteristic Function of a standard normal distribution is (steps here): $$E_{X\sim N(0,1)}[e^{itX}] = \int_{-\infty}^{\infty} e^{itx} \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}x^2} dx = e^{- \frac{t^2}{2}}$$

Here are a few properties of characteristic functions that are important for the proof:

There is a one-to-one mapping between probability distributions and characteristic functions. In other words, every probability distribution has a unique corresponding characteristic function and every characteristic functions has a unique corresponding distribution. You cannot have 2 different probability distributions that have the same characteristic function. If 2 distributions have the same characteristic functions, that implies they are the same distribution.
Scaling Property: \$\varphi_{aX}(t) = \varphi_X(at)\$. In other words, if \$\varphi_X(t)\$ is the characteristic function of \$X\$, then the characteristic function of \$ Z = aX \$ is \$\varphi_Z(t) = \varphi_X(aX)\$.
Summation Property: If we have independent random variables \$X_1,\ X_2,\ ...\ X_n\$ with corresponding characteristic functions \$ \varphi_{X_1}(t),\ \varphi_{X_2}(t), ..., \varphi_{X_n}(t) \$, then the characteristic function of their sum \$S_n = \sum_{i=1}^n X_i \ \$ is \$ \ \varphi_{S_n}(t) = \prod_{i=1}^n \varphi_{X_i}(t)\$
Derivatives Property: If \$X\$ is a random variable and \$E[|X|^k] \lt \infty \$, then its characteristic function \$\varphi_X(t)\$ has a finite jth derivative for all \$0 \leq j \leq k\$ and its jth derivative \$ \varphi_X^{(j)}(0) = i^j E[X^j]\$ where \$i=\sqrt{-1}\$.

The other key property is the Continuity Theorem:

Continuity Theorem

The Continuity Theorem is an important theorem that states that if we have a sequence of random variables \$X_1,\ X_2,\ X_3,\ ...\$ with their corresponding sequence of characteristic functions \$\varphi_1(t),\ \varphi_2(t),\ \varphi_3(t),\ ...\$ and their sequence of characteristic functions converges pointwise to some function \$\varphi\$: $$\lim_{n \to \infty} \varphi_n(t) \to \varphi(t) \ \ \ \ \ \ \ \ \forall t \in R$$ then $$X_n \overset{D}{\to} X$$ where \$X\$ is a random variable with the distribution that corresponds to the Characteristic Function \$\varphi(t)\$.

The classical central limit theorem proof below uses this fact by showing that the sequence of random variables that correspond to increasing \$n\$ in the standardized form central limit theorem has a corresponding sequence of characteristic functions that converges pointwise to the characteristic function of a standard normal distribution. Therefore, the sequence converges in distribution to a standard normal distribution:

Proof of the Central Limit Theorem

If we have \$X_1, X_2, ..., X_n\$ independant identically distributed random variables with characteristic functions \$\varphi_{X_1}(t), \varphi_{X_2}(t), ... \varphi_{X_n}(t)\$ and let $$ Z_n = \frac{S_n - n\times\mu}{\sigma \sqrt{n}} $$ we want to show pointwise convergence of the characteristic function of \$Z_n\$ to the characteristic function of a standard normal distrinution (\$\varphi_{N(0,1)}(t) = e^{\frac{-t^2}{2}}\$). In other words, that \$\forall \epsilon \gt 0, \forall t\$, there exists an \$N\$ such that \$ \forall n \ge N\$, $$ | \varphi_{Z_n}(t) - e^{-\frac{t^2}{2}} | \lt \epsilon $$ which is equivalent saying \$\forall \epsilon \gt 0, \forall t\$, there exists an \$N\$ such that \$ \forall n \ge N\$, $$ e^{-\frac{t^2}{2} - \epsilon} \lt \varphi_{Z_n}(t) \lt e^{-\frac{t^2}{2} + \epsilon} $$ The first step is rewritting \$Z_n\$: $$Z_n = \frac{ \left(\sum_{i=1}^n X_i\right) - n\mu}{\sqrt{n}\sigma} $$ $$ \iff$$ $$Z_n = \frac{\sum_{i=1}^n (X_i - \mu)}{\sqrt{n}\sigma} $$ $$ \iff$$ $$ Z_n = \frac{1}{\sqrt{n}} \sum_{i=1}^n \frac{X_i - \mu}{\sigma} $$ $$ \iff$$ $$ Z_n = \sum_{i=1}^n \frac{1}{\sqrt{n}}Y_i \ \ \text{and} \ \ Y_i = \frac{X_i - \mu}{\sigma}$$ with $$E[Y_i] = 0 \text{ and } Var[Y_i] = 1 \ \ \forall i$$ Since \$Z_n = \sum_{i=1}^n \frac{1}{\sqrt{n}}Y_i \$ and the \$Y_i\$'s are independent, the summation property of characteristic functions implies that the characteristic function of \$Z_n\$ is: $$\varphi_{Z_n}(t) = \prod_{i=1}^n \varphi_{\left( \frac{1}{\sqrt{n}} Y_i \right)}(t)$$ where \$\varphi_{\left( \frac{1}{\sqrt{n}} Y_i \right)}(t)\$ is the characteristic function of \$ \frac{1}{\sqrt{n}} Y_i \$. Using the scaling property of characteristic functions: $$\varphi_{Z_n}(t) = \prod_{i=1}^n \varphi_{Y_i} \left( \frac{t}{\sqrt{n}} \right)$$ since all the \$X_i\$'s are identically distributed so they all have the same characteristic function \$\varphi_Y(t)\$ $$\varphi_{Z_n}(t) = \left[\varphi_{Y}\left(\frac{t}{\sqrt{n}}\right)\right]^n$$ Doing a Taylor Expansion of \$\varphi_{Y}(\frac{t}{\sqrt{n}})\$ around 0 and given that (from the derivatives property) \$\varphi_Y(0) = i^0E[Y^0]= 1 \$, \$ \ \ \frac{\partial}{\partial t}\varphi_Y(0) = iE[Y] \$, \$ \ \ \frac{\partial^2}{\partial t^2}\varphi_Y(0) = i^2E[Y^2]\$, $$\varphi_{Y}\left(\frac{t}{\sqrt{n}}\right) =1 + iE[Y]\frac{t}{\sqrt{n}} + \frac{i^2E[Y^2]}{2!} \left(\frac{t}{\sqrt{n}}\right)^2 + o\left(\frac{t^2}{n}\right)$$ where \$ \frac{o\left(\frac{t^2}{n}\right) }{ \left(\frac{t^2}{n}\right) } \to 0 \$ as \$\left(\frac{t}{\sqrt{n}}\right) \to 0\$. Since \$E[Y] = 0\$ and \$ E[Y^2] = 1 \$ and given that the limit the limit is with respect to \$n \to \infty \$ with \$t\$ kept constant: $$\varphi_{Y} \left(\frac{t}{\sqrt{n}}\right) =1 - \frac{t^2}{2n} + o\left(\frac{1}{n}\right)$$ where \$ n \times o\left(\frac{1}{n}\right) \to 0 \$ as \$ n \to \infty \$ and inserting this expansion into the \$\varphi_{Z_n}(t) = \left[\varphi_{Y}(\frac{1}{\sqrt{n}})\right]^n\$ equation: $$\varphi_{Z_n}(t) = \left[1 - \frac{t^2}{2n} + o\left(\frac{1}{n}\right) \right]^n$$ this can be rewritten as $$\varphi_{Z_n}(t) = \left[1 + \frac{ -\frac{t^2}{2} + n \times o\left(\frac{1}{n}\right) }{n} \right]^n $$ Since \$ n \times o(\frac{1}{n}) \to 0\$ as \$n \to \infty\$, for any \$ \xi \gt 0\$, \$\exists N_{\xi} \$ such that \$ -\xi \lt n\times o(\frac{1}{n}) \lt \xi \$. So for for arbitrarily small \$ \xi > 0 \$, for \$ n \ge N_{\xi} \$ $$ \left[1 + \frac{ -\frac{t^2}{2} - \xi }{n} \right]^n \le \varphi_{Z_n}(t) \le \left[1 + \frac{ -\frac{t^2}{2} + \xi }{n} \right]^n $$ and $$ \lim_{n \to \infty} \left[1 + \frac{ -\frac{t^2}{2} - \xi }{n} \right]^n \le \lim_{n \to \infty} \varphi_{Z_n}(t) \le \lim_{n \to \infty} \left[1 + \frac{ -\frac{t^2}{2} + \xi }{n} \right]^n $$ \$ \lim_{n \to \infty} [1 + \frac{x}{n}]^n = e^x\$ so $$ e^{-\frac{t^2}{2} - \xi} \le \lim_{n \to \infty} \varphi_{Z_n}(t) \le e^{-\frac{t^2}{2} + \xi} $$ for arbitrarily small \$ \xi > 0 \$ so $$ \lim_{n \to \infty} \varphi_{Z_n}(t) = e^{-\frac{t^2}{2} }$$ which is the characteristic function of a standard normal distribution. Therefore by the Continuity Theorem, $$Z_n \overset{D}{\to} N(0,1)$$

Derivation of the Characteristic Function of a Standard Normal Distribution

The probability density function of a standard normal random variable \$X\$ is $$f_X(x) = \frac{1}{\sqrt{2\pi}}e^\frac{-x^2}{2}$$ so its characteristic function is $$\varphi_X(t) = E[e^{itX}] = \int_{-\infty}^\infty e^{itx} \frac{1}{\sqrt{2\pi}}e^\frac{-x^2}{2}dx$$ this can be rewritten as $$\varphi_X(t) = \int_{-\infty}^\infty e^{-\frac{t^2}{2}} e^{+\frac{t^2}{2}} e^{itx} \frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}dx$$ $$\varphi_X(t) = \frac{e^{-t^2/2}}{\sqrt{2\pi}} \int_{-\infty}^\infty e^{+\frac{t^2}{2}} e^{itx} e^{-\frac{x^2}{2}}dx$$ $$\varphi_X(t) = \frac{e^{-t^2/2}}{\sqrt{2\pi}} \int_{-\infty}^\infty e^{+\frac{t^2}{2} + itx - \frac{x^2}{2}}dx$$ completing the square of the powers within the integral $$\varphi_X(t) = \frac{e^{-t^2/2}}{\sqrt{2\pi}} \int_{-\infty}^\infty e^{ -(x-it)^2 /2}dx$$ Let $$S = x-it$$ then $$\varphi_X(t) = \frac{e^{-t^2/2}}{\sqrt{2\pi}} \int_{-\infty-it}^{+\infty-it} e^{-s^2/2} dx$$ Now using complex analysis to evaluate \$\int_{-\infty-it}^{+\infty-it} e^{-s^2/2} dx\$. First note that \$ e^{-s^2/2}\$ is an analytic function everywhere on the complex plane because it is a composition of analytic functions (exponential and polynomial). Consequently, by Cauchy's integral theorem, its contour integral for any closed path is equal to zero $$\oint e^{-s^2/2} ds= 0$$ so writing a contour integral in terms of some \$\alpha > 0\$ $$0= \int_{\alpha}^{-\alpha} e^{-s^2/2}ds + \int_{-\alpha}^{-\alpha-it}e^{-s^2/2}ds $$ $$+ \int_{-\alpha-it}^{\alpha-it} e^{-s^2/2}ds + \int_{\alpha-it}^{\alpha}e^{-s^2/2}ds$$ then taking the limit as \$\alpha \to \infty \$, \$ \ \ \int_{\alpha-it}^{\alpha}e^{-s^2/2} ds \to 0\$ and \$ \int_{\alpha-it}^{\alpha}e^{-s^2/2} ds \to 0\$ so $$0= \int_{\infty}^{-\infty} e^{-s^2/2}ds + \int_{\infty-it}^{\infty-it}e^{-s^2/2}ds$$ knowing that \$ \int_{\infty}^{-\infty} e^{-s^2/2}ds = -\sqrt{2\pi}\$ since \$ \int_{\infty}^{-\infty} e^{-s^2/2}ds \$ \$ = -\int_{-\infty}^{+\infty} e^{-s^2/2}ds \$ \$ = -\sqrt{\int_{-\infty}^{+\infty} e^{-s^2/2}ds \times \int_{-\infty}^{+\infty} e^{-t^2/2}dt}\$ \$ = -\sqrt{\int_{-\infty}^{+\infty} \int_{-\infty}^{+\infty} e^{-(s^2+t^2)/2}dsdt}\$ (now changing to polar coordinates) \$ = -\sqrt{\int_{0}^{2\pi} \int_{0}^{+\infty} e^{-r^2/2}rdrd\theta}\$ \$ = -\sqrt{\int_{0}^{2\pi} 1d\theta}\$ \$ = -\sqrt{2\pi}\$ it follows that, $$\int_{\infty-it}^{\infty-it}e^{-s^2/2}ds = \sqrt{2\pi}$$ Plugging back this result into \$\varphi_X(t) = \frac{e^{-t^2/2}}{\sqrt{2\pi}} \int_{-\infty-it}^{+\infty-it} e^{-s^2/2} dx\$ yields $$\varphi_X(t) = \frac{e^{-t^2/2}}{\sqrt{2\pi}} \sqrt{2\pi} = e^{-t^2/2}$$

More Information

Basic information on the Normal Distribution including interactive chart
More information on the Normal Distribution
MIT Lecture Notes on the Central Limit Theorem and the Law of Large Numbers
More general information on the the Central Limit Theorem (including many other variations)
More information on Characteristic Functions
More information on Lévy's continuity theorem
Video explantion of characteristic functions: Characteristic Functions,
Video proof of central limit theorem: CLT Proof Part 1, CLT Proof Part 2, CLT Proof Part 3, CLT Proof Part 4.
Video explanations of the derivation of the characteristic function of a standard normal distribution: Part 1, Part 2, Part 3.
Book: First Look At Rigorous Probability Theory

The Author

Sebastien Lemieux-Codere

Sebastien is a Data Scientist and Software Developer.

The Central Limit Theorem Explained with Simulation and Proof