Distributions

The Gaussian distribution also known as the normal distribution is a common continuous distribution.

Univariate gaussian distribution

The probability density function is given as:

p(xμ,σ2)=N(xμ,σ2)=12πσ2 exp((xμ)22σ2)\Bbb p(x | \mu, \sigma^{2}) = N(x | \mu, \sigma^{2}) = \dfrac{1}{\sqrt{2\pi\sigma^{2}}} \space \exp \Big(\dfrac{-(x-\mu)^{2}}{2\sigma^{2}}\Big)

The parameters estimations are:

μ=1Nn=1Nxn\mu = \dfrac{1}{N} \displaystyle\sum\limits_{n=1}^{N} x_{n}

σ2=1Nn=1N(xnμ)2\sigma^{2} = \dfrac{1}{N} \displaystyle\sum\limits_{n=1}^{N} (x_{n}-\mu)^{2}

where the mean (μ\mu) is the location, and the variance (σ2\sigma^{2}) is the dispersion. The xnx_{n} denotes the feature value of nthn^{th} sample, and NN is the number of samples in total.

The maximum likelihood can be calculated using optimisation. First assume independence of training samples, and apply natural logs.

LL(μ,σ2)=ln(p(x1μ,σ2)p(xNμ,σ2))=n=1Nln(p(x1μ,σ2)=n=1Nln(12πσ2exp((xnμ)22σ2))=N2ln(2π)N2ln(σ2)n=1N(xnμ)22σ2\begin{aligned} LL(\mu, \sigma^{2}) &= \ln(p(x_{1} | \mu, \sigma^{2}) \dots p(x_{N} | \mu, \sigma^{2})) \\\\ &= \displaystyle\sum\limits_{n=1}^{N} \ln(p(x_{1} | \mu, \sigma^{2}) \\\\ &= \displaystyle\sum\limits_{n=1}^{N} \ln (\frac{1}{\sqrt{2\pi\sigma^{2}}} \exp (\frac{(x_{n} - \mu)^{2}}{2\sigma^{2}})) \\\\ &= - \frac{N}{2} \ln(2\pi) - \frac{N}{2} \ln(\sigma^{2}) - \displaystyle\sum\limits_{n=1}^{N} \frac{(x_{n} - \mu)^{2}}{2\sigma^{2}} \end{aligned}

The optimal parameters can be solved by:

LL(μ,σ2)μ=0LL(μ,σ2)σ2=0\dfrac{\partial LL(\mu, \sigma^{2})}{\partial \mu} = 0 \qquad \dfrac{\partial LL(\mu, \sigma^{2})}{\partial \sigma^{2}} = 0

Multivariate gaussian distribution

For the multi-variate gaussian distribution in D-dimensions, we have the below probability density function:

p(xμ,)=N(xμ,)=1(2π)D/21/2exp(12(xμ)T1(xμ))\begin{aligned} p(x | \mu, \sum) &= N(x | \mu, \sum) \\\\ &= \dfrac{1}{(2\pi)^{D/2} |\sum|^{1/2}} \exp (-\frac{1}{2}(x-\mu)^{T} \sum^{-1} (x-\mu)) \end{aligned}

The parameters estimations are:

μ=1Nn=1Nxn\mu = \dfrac{1}{N} \displaystyle\sum\limits_{n=1}^{N} x_{n}

=1Nn=1N(xnμ)(xnμ)T\sum = \dfrac{1}{N} \displaystyle\sum\limits_{n=1}^{N} (x_{n}-\mu)(x_{n}-\mu)^{T}

Correlation Coefficient

The Pearson’s Correlation Coefficient is a measure of the linear correlation between two variables X and Y.

p(xi,xj)=pi,j=σi,jσi,iσj,jp(x_{i}, x_{j}) = p_{i, j} = \dfrac{\sigma_{i, j}}{\sqrt{\sigma_{i, i} \sigma_{j, j}}}

The correlation coefficient p(xi,xj)p(x_{i}, x_{j}) is obtained by normalising the covariance σi,j\sigma_{i, j} by the square root of the product of the variances σi,i\sigma_{i, i} and σj,j\sigma_{j, j}, and satisfies 1σi,j1−1 \leq \sigma_{i, j} \leq 1.

Bayes’ theorem

We can use Bayes’ theorem for continuous data x and discrete class k as:

P(Ckx)=P(xck)P(Ck)P(x)\Bbb{P}(C_k|x) = \dfrac{{\Bbb{P}(x|c_k)}{\Bbb{P}(C_k)}}{\Bbb{P}(x)}

P(x)=j=1KP(xCj)P(Cj)\Bbb{P}(x) = \displaystyle\sum_{j=1}^K \Bbb{P}(x|C_j)\Bbb{P}(C_j)

P(Ckx)P(xCk)P(Ck)N(xμk,σk2)P(Ck)12πσk2 exp((xμk)22σk2)P(Ck)\begin{aligned} \Bbb{P}(C_k|x) &\propto \Bbb{P}(x|C_k)\Bbb{P}(C_k) \\\\ &\propto N(x | \mu_k, \sigma^{2}_k) \Bbb{P}(C_k) \\\\ &\propto \dfrac{1}{\sqrt{2\pi\sigma^{2}_k}} \space \exp \Big(\dfrac{-(x-\mu_k)^{2}}{2\sigma^{2}_k}\Big) \Bbb{P}(C_k) \end{aligned}