top of page

Probability Density Function

PDF.JPG

Again, instead of memorizing the formula, I decomposed the equation to its fundamental components. I analyzed the exponent term first:

exp.JPG

x-u measures the difference between each element and the average or expected value (i.e. sample is large enough and unbiased), but what does it mean to divide this difference by σ (standard deviation)? The formula becomes a little less opaque when it's rewritten this way: 

exp2.JPG

The numerator and denominator look eerily similar when I deconstruct the variance term: 

exp3.JPG

This calculation compares each deviation to the average of the whole population. In other words, it is a way to standardize the data.

What about the factor -1/2? Once again, I find it easier to understand the concept when it is decomposed to -1 and 1/2. -1 transforms the symmetric smile into the symmetric mountain

PDF_Integral_1.JPG

How about 1/2? This one took me some time to understand. If I remove the term, the distribution becomes more concentrated around the mean (red graph below). But why would is this called 'normal' or 'standard'? 

PDF_Integral_2.JPG

After many trials, I integrated the formula in an attempt to find the reason behind this nomenclature:

PDF_Integral_4.JPG
PDF_Integral_3.JPG

Integrating the formula yields a square root of 2π or 2.506628275 and integrating the same formula from 0 to 1 yields 0.855624392. This means 68.2689% of the total area under the curve resides between -1=-σ and 1=σ (the blue area between two green bars). 68.27% confirms the definition of the probability density function of a normal distribution, but what's so special about this? According to Wikipedia, "authors differ on which normal distribution should be called the standard one." This open-ended debate made me question how much weight one should put on human intuition (e.g. elegance) vs. data (e.g. thermal fluctuations) in defining concepts such as standard normal distribution. 

PDF_Integral_5.jpg
eexp.JPG

Next step is taking e to the power of the final result calculated above.  In addition to the numerical validation of e, here is a good explanation of Euler's number. 

pi.JPG

The last factor ensures that the total area under the curve is equal to one (i.e. integral is 1).

bottom of page