What would those look like for a multivariate Gaussian, say to plot interpretable contours?
Figure 1.
A Gaussian mixture model fit to the Old faithful dataset.
The 3 contours indicate areas that are the 2d equivalent of the 1σ, 2σ and 3σ intervals
in that they contain 68.3%, 95.5% and 99.7% of the probability mass.1
We will build a similar result for multivariate Gaussians,
in d=2 dimensions;
The general form is that
p(∥x−μ∥Σ−1≤R)=c
for R=−2log(1−c).
Confidence sets
The equivalent of the “confidence interval” in d dimensions for a distribution N(μ,Σ)
would be a set S that
is centered at μ and has the same contours as the sublevel sets of the density
N(x;μ,Σ)=(2π)ddet(Σ)1exp(−21∥x−μ∥Σ−12).
We want this set to contain a given proportion c of the probability mass, p(x∈S)=c.
The easy case
Starting with the easy case,
let’s look at a standard multivariate Gaussian x∼N(0,I).
Because symmetry, the confidence set will be a ball ∥x∥≤R.
The question is how to set the radius R as a function of c.
We are looking for the solution of
∫∥x∥<R(2π)d1exp(−21∥x∥2)dx=c.
To simplify the equation,
we can integrate over the radius
and the area separately,
For the general case of x∼N(μ,Σ),
the sets will be of the form ∥x−μ∥Σ−1/2≤R.
Perhaps surprisingly, the formula for the radius R is the same.
To see this, we can use the reparametrization trick;
if x∼N(0,I), then z=Ax+μ follows
the distribution z∼N(μ,AA⊤), meaning that
Ez∼N(μ,Σ)[f(z)]=Ex∼N(0,I)[f(Σ−1/2x+μ)].
Writing the proportion in terms of the expectation of the indicator function,
The simplicity of the formula for
the relationship between c and R
in d=2 dimensions is special.
With d=1, the probability depends on the erf function,
and the situation gets worse with higher d;
for d=1,for d=2,for any d,c=erf(2R)=∫02Rexp(−t2)dt,c=1−exp(−21R2),c=1−Γ(2d)Γ(2d,2R2).
To finish, suppose we want to plot this 2d confidence region.
One option would be to generate points on the boundary ∥x−μ∥Σ−1=R
and ax.fill
the resulting polygon.
To generate those points, we can again use the reparametrization trick.
For a given point z=Σ1/2x+μ,
∥z−μ∥Σ−12=R2 and ∥x∥2=R2 are equivalent.
We can thus generate points along ∥x∥=R
and by sampling an angle α between 0 and 2π,
compute x=[sin(α),cos(α)] and rescale appropriately.
importnumpyasnpimportscipyasspdefcontour(mu,Sigma,r=1.0,density=100):"""
Coordinates of the confidence region ‖x−𝜇‖_{𝛴⁻¹} = R
"""angles=np.linspace(0,2*np.pi,density)sin,cos=np.sin(angles),np.cos(angles)A=sp.linalg.sqrtm(Sigma)*rxs=A[0,0]*sin+A[0,1]*cos+mu[0]ys=A[1,1]*cos+A[1,0]*sin+mu[1]returnxs,ys
An alternative definition for the 1σ, 2σ and 3σ confidence sets
could be the sets ∥x−μ∥Σ−1≤1,2,3.
Those have the nice property that any one-dimensional slice passing through the center of the set
recovers the one-dimensional confidence intervals.
But those sets are smaller, and their probability mass is only 39.3%, 86.5% and 98.8%. ↩