Confidence Sets for Multivariate Gaussians


For a standard Gaussian N(μ,σ2)\Normal(\mu, \sigma^2), we have the confidence intervals

p(μ1σ<x<μ+1σ)68.3%p(μ2σ<x<μ+2σ)95.5%p(μ3σ<x<μ+3σ)99.7%\begin{array}{rcl} p(\mu-1\sigma < x < \mu + 1\sigma) &\approx& 68.3\% \\ p(\mu-2\sigma < x < \mu + 2\sigma) &\approx& 95.5\% \\ p(\mu-3\sigma < x < \mu + 3\sigma) &\approx& 99.7\% \end{array}

What would those look like for a multivariate Gaussian, say to plot interpretable contours?


Gaussian Mixture Models, plotted using 3-sigma Figure 1. A Gaussian mixture model fit to the Old faithful dataset. The 3 contours indicate areas that are the 2d equivalent of the 1σ1\sigma, 2σ2\sigma and 3σ3\sigma intervals in that they contain 68.3%, 95.5% and 99.7% of the probability mass.1


We will build a similar result for multivariate Gaussians, in d=2d=2 dimensions;

p(xμΣ11.516...)68.3%,p(xμΣ12.490...)95.5%,p(xμΣ13.409...)99.7%.\begin{array}{rcl} p(\norm{x-\mu}_{\Sigma^{-1}} \leq 1.516...) \approx 68.3\%, \\ p(\norm{x-\mu}_{\Sigma^{-1}} \leq 2.490...) \approx 95.5\%, \\ p(\norm{x-\mu}_{\Sigma^{-1}} \leq 3.409...) \approx 99.7\%. \\ \end{array}

The general form is that p(xμΣ1R)=cp(\norm{x-\mu}_{\Sigma^{-1}} \leq R) = c for R=2log(1c)R = \sqrt{-2\log(1-c)}.

Confidence sets

The equivalent of the “confidence interval” in dd dimensions for a distribution N(μ,Σ)\Normal(\mu, \Sigma) would be a set S\mathcal{S} that is centered at μ\mu and has the same contours as the sublevel sets of the density

N(x;μ,Σ)=1(2π)ddet(Σ)exp(12xμΣ12).\Normal(x; \mu, \Sigma) = \sqrt{\frac{1}{(2\pi)^d \det(\Sigma)}} \exp\left(-\frac{1}{2}\Vert x-\mu\Vert_{\Sigma^{-1}}^2\right).

We want this set to contain a given proportion cc of the probability mass, p(xS)=cp(x \in \mathcal{S}) = c.

The easy case

Starting with the easy case, let’s look at a standard multivariate Gaussian xN(0,I)x \sim \Normal(0, I). Because symmetry, the confidence set will be a ball xR\Vert x\Vert \leq R. The question is how to set the radius RR as a function of cc. We are looking for the solution of

x<R1(2π)dexp(12x2)dx=c.\int_{\Vert x \Vert < R} \sqrt{\frac{1}{(2\pi)^d}} \exp\left(-\frac{1}{2}\Vert x\Vert^2\right) \mathrm{d}x = c.

To simplify the equation, we can integrate over the radius and the area separately,

c=r=0Rx=r1(2π)dexp(12r2)dxdr,=1(2π)dr=0Rexp(12r2)(x=r1dx)dr.\begin{aligned} c &= \int_{r=0}^R \int_{\Vert x \Vert = r} \sqrt{\frac{1}{(2\pi)^d}} \exp\left(-\frac{1}{2}r^2\right) \mathrm{d}x \mathrm{d}r, \\ &= \sqrt{\frac{1}{(2\pi)^d}} \int_{r=0}^R \exp\left(-\frac{1}{2} r^2\right) \left( \int_{\Vert x \Vert = r} 1 \mathrm{d}x \right) \mathrm{d}r. \end{aligned}

The integral in parentheses is the volume of the sphere of radius rr in dd dimensions,

x=r1dx=rd12πdΓ(d/2),\int_{\Vert x \Vert = r} 1 \mathrm{d}x = r^{d-1} \frac{2\sqrt{\pi^d}}{\Gamma(d/2)},

where Γ\Gamma is the Gamma function. Plugging this relationship in, we get

c=12d2Γ(d/2)r=0Rrd1exp(12r2)dr.\begin{aligned} c &= \sqrt{\frac{1}{2^d}} \frac{2}{\Gamma(d/2)} \int_{r=0}^R r^{d-1} \exp\left(-\frac{1}{2} r^2\right) \mathrm{d}r. \end{aligned}

This equation has a surprisingly simple solution in d=2d=2 dimensions, as

c=142Γ(1)=1r=0Rrexp(12r2)dr=1exp(12R2).c = \underbrace{\sqrt{\frac{1}{4}} \frac{2}{\Gamma(1)}}_{=1} \int_{r=0}^R r \exp\left(-\frac{1}{2} r^2\right) \mathrm{d}r = 1 - \exp\left(-\frac{1}{2}R^2\right).

This gives the inverse relation R=2log(1c)R = \sqrt{-2\log(1-c)}, and the thresholds

p(x<1.516)68.3%,p(x<2.490)95.5%,p(x<3.409)99.7%.\begin{aligned} p(\norm{x} < 1.516) \approx 68.3\%, \\ p(\norm{x} < 2.490) \approx 95.5\%, \\ p(\norm{x} < 3.409) \approx 99.7\%. \\ \end{aligned}

Arbitrary Gaussians

For the general case of xN(μ,Σ)x \sim \Normal(\mu, \Sigma), the sets will be of the form xμΣ1/2R\Vert x-\mu\Vert_{\Sigma^{-1/2}} \leq R. Perhaps surprisingly, the formula for the radius RR is the same. To see this, we can use the reparametrization trick; if xN(0,I)x \sim \Normal(0, I), then z=Ax+μz = Ax + \mu follows the distribution zN(μ,AA)z \sim \Normal(\mu, AA^\top), meaning that

EzN(μ,Σ)[f(z)]=ExN(0,I)[f(Σ1/2x+μ)].\mathbb{E}_{z \sim \Normal(\mu, \Sigma)}[f(z)] = \mathbb{E}_{x \sim \Normal(0, I)}[f(\Sigma^{-1/2}x + \mu)].

Writing the proportion in terms of the expectation of the indicator function,

c=12zμΣ12<12R2N(z;μ,Σ)dz=EzN(μ,Σ)[ind ⁣(12zμΣ12<12R2)],c = \int_{\frac{1}{2}\norm{z-\mu}^2_{\Sigma^{-1}} < \frac{1}{2}R^2} \Normal(z; \mu, \Sigma) \mathrm{d}z = \mathbb{E}_{z \sim \Normal(\mu, \Sigma)}\brackets{ \ind{ \frac{1}{2}\norm{z-\mu}^2_{\Sigma^{-1}} < \frac{1}{2}R^2 } },

we can see the equivalence with the standard Gaussian case as

c=EzN(μ,Σ)[ind ⁣(12zμΣ12<12R2)],=ExN(0,I)[ind ⁣(12Σ1/2x+μμΣ12<12R2)],=ExN(0,I)[ind ⁣(12x2<12R2)]=1exp(12R2).\begin{aligned} c &= \mathbb{E}_{z \sim \Normal(\mu, \Sigma)}\brackets{ \ind{ \frac{1}{2}\norm{z-\mu}^2_{\Sigma^{-1}} < \frac{1}{2}R^2 } }, \\ &= \mathbb{E}_{x \sim \Normal(0, I)}\brackets{ \ind{ \frac{1}{2}\norm{\Sigma^{1/2}x + \mu - \mu}^2_{\Sigma^{-1}} < \frac{1}{2}R^2 } }, \\ &= \mathbb{E}_{x \sim \Normal(0, I)}\brackets{ \ind{ \frac{1}{2}\norm{x}^2 < \frac{1}{2}R^2 } } = 1 - \exp\left(-\frac{1}{2}R^2\right). \end{aligned}

Higher dimensions

The simplicity of the formula for the relationship between cc and RR in d=2d=2 dimensions is special. With d=1d=1, the probability depends on the erf\mathrm{erf} function, and the situation gets worse with higher dd;

for d=1,c=erf(R2)=0R2exp(t2)dt,for d=2,c=1exp(12R2),for any d,c=1Γ(d2,R22)Γ(d2).\begin{aligned} \text{for } d = 1, & \quad c = \mathrm{erf}\paren{\frac{R}{\sqrt{2}}} = \int_{0}^{\frac{R}{\sqrt{2}}} \exp(-t^2) \mathrm{d}t, \\ \text{for } d = 2, & \quad c = 1 - \exp\paren{-\frac{1}{2}R^2}, \\ \text{for any } d, & \quad c = 1 - \frac{ \Gamma\paren{\frac{d}{2}, \frac{R^2}{2}} }{ \Gamma\paren{\frac{d}{2}} }. \end{aligned}

For the general case, we have to solve

c=12d2Γ(d/2)r=0Rrd1exp(12r2)dr.\begin{aligned} c &= \sqrt{\frac{1}{2^d}} \frac{2}{\Gamma(d/2)} \int_{r=0}^R r^{d-1} \exp\left(-\frac{1}{2} r^2\right) \mathrm{d}r. \end{aligned}

The integral part has the following expression

r=0Rrd1exp(12r2)dr=2d2(Γ(d2)Γ(d2,R22)),\int_{r=0}^R r^{d-1} \exp\left(-\frac{1}{2} r^2\right) \mathrm{d}r = \frac{\sqrt{2^{d}}}{2} \paren{ \Gamma\paren{\frac{d}{2}} - \Gamma\paren{\frac{d}{2}, \frac{R^2}{2}} },

where Γ(a,b)\Gamma(a, b) is the incomplete Gamma function. The constants simplify to give

c=12d2Γ(d/2)2d2(Γ(d2)Γ(d2,R22))=1Γ(d2,R22)Γ(d2)\begin{aligned} c &= \sqrt{\frac{1}{2^d}} \frac{2}{\Gamma(d/2)} \frac{\sqrt{2^{d}}}{2} \paren{ \Gamma\paren{\frac{d}{2}} - \Gamma\paren{\frac{d}{2}, \frac{R^2}{2}} } = 1 - \frac{ \Gamma\paren{\frac{d}{2}, \frac{R^2}{2}} }{ \Gamma\paren{\frac{d}{2}} } \end{aligned}

Plotting code

As we started with a plotting question,

To finish, suppose we want to plot this 2d confidence region. One option would be to generate points on the boundary xμΣ1=R\norm{x-\mu}_{\Sigma^{-1}} = R and ax.fill the resulting polygon. To generate those points, we can again use the reparametrization trick. For a given point z=Σ1/2x+μz = \Sigma^{1/2}x + \mu, zμΣ12=R2\norm{z - \mu}_{\Sigma^{-1}}^2 = R^2 and x2=R2\norm{x}^2 = R^2 are equivalent. We can thus generate points along x=R\norm{x} = R and by sampling an angle α\alpha between 00 and 2π2\pi, compute x=[sin(α),cos(α)]x = [\sin(\alpha),\cos(\alpha)] and rescale appropriately.

import numpy as np 
import scipy as sp

def contour(mu, Sigma, r=1.0, density=100):
    """
    Coordinates of the confidence region ‖x−𝜇‖_{𝛴⁻¹} = R
    """
    angles = np.linspace(0, 2 * np.pi, density)
    sin, cos = np.sin(angles), np.cos(angles)
    A = sp.linalg.sqrtm(Sigma) * r
    xs = A[0, 0] * sin + A[0, 1] * cos + mu[0]
    ys = A[1, 1] * cos + A[1, 0] * sin + mu[1]
    return xs, ys

We can now plot Gaussian densities

import matplotlib.pyplot as plt

fig = plt.figure()
fig.set_size_inches(6, 6)
ax = fig.add_subplot(111)


mu, Sigma = np.array([2, 2]), np.diag([1, 2])
for r in [1.515, 2.490, 3.409]:
    ax.fill(*contour(mu, Sigma, r=r), alpha=0.2, color="red")

mu, Sigma = np.array([-2, -2]), np.diag([2, 1])
for r in [1.515, 2.490, 3.409]:
    ax.fill(*contour(mu, Sigma, r=r), alpha=0.2, color="blue")

ax.set_xlim([-8, 8])
ax.set_ylim([-8, 8])

plt.show()

Script to reproduce Figure 1.


  1. An alternative definition for the 1σ1\sigma, 2σ2\sigma and 3σ3\sigma confidence sets could be the sets xμΣ11,2,3\norm{x-\mu}_{\Sigma^{-1}} \leq 1, 2, 3. Those have the nice property that any one-dimensional slice passing through the center of the set recovers the one-dimensional confidence intervals. But those sets are smaller, and their probability mass is only 39.3%, 86.5% and 98.8%.