This is a post I’ve been wanting to write for a while – Quadratic forms and Definite matrices are everywhere in linear algebra and they have nice properties which are useful in data analysis. As a result, it’s important to get a grounding how these very basic building blocks work. It wasn’t until I saw these ideas geometrically that they clicked for me, so I include some diagrams in the post.

Quadratic Forms

A quadratic form on R^n is a real-valued function of the form
$$ Q(x_1,…,x_n) = \sum_{i \leq j} a_{ij}x_ix_j$$

Each quadratic form Q can be represented by a symmetric matrix A (A = A^T) so that 
$$ Q(\mathbf{x}) = \mathbf{x}^TA\mathbf{x}$$

For instance, the general 2 dimensional quadratic form
$$a_{11}x^2_1 + a_{12}x_1x_2 + a_{22}x^2_2$$

can be written as 
$$\begin{bmatrix} x_1 & x_2 \end{bmatrix}\begin{bmatrix} a_{11} & \frac{1}{2}a_{12} \\ \frac{1}{2}a_{21} & a_{22} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}$$
The general 3 dimensional quadratic form
$$a_{11}x^2_1 + a_{22}x^2_2 + a_{33}x^2_3 + a_{12}x_1x_2 + a_{13}x_1x_3 + a_{23}x_2x_3$$
can be written as 
$$\begin{bmatrix} x_1 & x_2 & x_3 \end{bmatrix}\begin{bmatrix} a_{11} & \frac{1}{2}a_{12} & \frac{1}{2}a_{13}\\ \frac{1}{2}a_{21} & a_{22} & \frac{1}{2}a_{23} \\ \frac{1}{2}a_{13} & \frac{1}{2}a_{23} & a_{33} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}$$

Definite Quadratic Forms

A quadratic form always takes the value zero at \mathbf{x} = \mathbf{0}. The general quadratic form of one variable is y=ax^2. If a > 0, then ax^2 is always \geq 0 and equals 0 only when x = 0. Such a form is positive definite; x = 0 is its global minimiser. If a < 0, then ax^2 is always \leq 0 and equals 0 only when x = 0. Such a quadratic form is called negative definite; x = 0 is its global maximiser. 

In 2 dimensions, the quadratic form
$$Q_1(x_1, x_2) = x^2_1 + x^2_2$$
is always greater than 0 at (x_1, x_2) \neq (0,0) – so we call Q_1 positive definite.

Quadratic forms like 
$$Q_2(x_1, x_2) = -x^2_1 – x^2_2$$
are strictly negative except at the origin – we call these forms negative definite. Quadratic forms like 
$$Q_3(x_1,x_2) = x^2_1 – x^2_2$$ 
take on both positive and negative values – $$Q_3(1, 0) = 1$$ and $$Q_3(0,1) = -1$$

These forms are called indefinite. A quadratic from which is always greater than or equal to 0 but may equal zero at some nonzero \mathbf{x} (as opposed to being zero only at \mathbf{x} = \mathbf{0}) is called positive semidefinite. An example of this form is 
$$Q_4(x_1, x_2) = (x_1 + x_2)^2 = x^2_1 + 2x_1x_2 + x^2_2$$
which is never negative but equals 0 at nonzero points like (x_1, x_2) = (1,-1) or (x_1, x_2) = (-2,2). A quadratic form which is always \leq 0 but may equal 0 at some nonzero \mathbf{x} (as opposed to only at \mathbf{x} = \mathbf{0}) is called negative semidefinite. An example of this form is
$$Q_5(x_1, x_2) = -(x_1 + x_2)^2$$
Some (admittedly terrible) sketches of the types of curves are:

Definite Symmetric Matrices

Let A be a n \times n symmetric matrix, then A can be:

  1. Postive Definite: \mathbf{x}^TA\mathbf{x} > 0 for all \mathbf{x} \neq \mathbf{0} in R^n
  2. Positive Semidefinite: \mathbf{x}^TA\mathbf{x} \geq 0 for all \mathbf{x} \neq \mathbf{0} in R^n
  3. Negative Definite: \mathbf{x}^TA\mathbf{x} < 0 for all \mathbf{x} \neq \mathbf{0} in R^n
  4. Negative Semidefinite: \mathbf{x}^TA\mathbf{x} \leq 0 for all \mathbf{x} \neq \mathbf{0} in R^n
  5. Indefinite: \mathbf{x}^TA\mathbf{x} > 0 for some \mathbf{x} in R^n and \mathbf{x}^TA\mathbf{x} < 0 for some \mathbf{x} in R^n

Definiteness of Diagonal Matrices

The simplest n \times n symmetric matrices are the diagonal matrices. They correspond to the simplest quadratic forms
$$\begin{bmatrix} x_1 & … & x_n \end{bmatrix} \begin{bmatrix} a_{11} & 0 & … & 0 \\ 0 & a_{22} & … & 0 \\ … \\ 0 & 0 & … & a_{nn} \end{bmatrix} \begin{bmatrix} x_1 \\ … \\ x_n \end{bmatrix} = a_{11}x^2_1 + a_{22}x^2_2 + …+ a_{nn}x^2_n$$

The quadratic form will be positive definite iff all the a_i,\ i=1,…,n are positive and negative definite iff all the a_i,\ i=1,…,n are negative. The quadratic form will be positive semidefinite iff all the a_i,\ i=1,…,n are greater than or equal to 0 and negative semidefinite iff all the a_i,\ i=1,…,n are less than or equal to 0. If there are 2 a_i of opposite sign, this quadratic form will be indefinite.

The principal submatrices of a diagonal matrix themselves will be diagonal matrices and their determinants – the principal minors – are products of their diagonal entries. If all the a_i are positive (so the form is positive definite), then all their products are positive and so all the leading principal minors are positive. If all the a_i are negative (so the form is negative definite), then the product of odd numbers of the a_i will be negative. The product of even numbers of the a_i will be positive. If a_{11} is zero in the above quadratic expression, the quadratic form is indefinite. This is because all leading submatrices will contain a_{11} and all leading principal minors will also be 0. 

Definiteness of 2 \times 2 matrix

Consider the general quadratic form R^2
$$Q(x_1, x_2) = \begin{bmatrix} x_1 & x_2 \end{bmatrix} \begin{bmatrix} a & b \\ b & c \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = ax^2_1 + 2bx_1x_2 + cx^2_2$$

If a = 0, then Q cannot be positive or negative definite because Q(1,0) = 0. Assume a \neq 0 and complete the square in the previous expression by adding
$$Q(x_1, x_2) = ax^2_1 + 2bx_1x_2 + cx^2_2 + \frac{b^2}{a}x^2_2 – \frac{b^2}{a}x^2_2$$
$$Q(x_1, x_2) = a(x^2_1 + \frac{2b}{a}x_1x_2 + \frac{b}{a^2}x^2_2) – \frac{b^2}{a}x^2_2 + cx^2_2$$
$$Q(x_1, x_2) = a(x_1 + \frac{b}{a}x_2)^2 + \frac{ac -b^2}{a} x_2$$

Principal Minors of a Matrix

  1. Principal Submatrix: Let A be a n \times n matrix. A k \times k submatrix of A formed by deleting n-k columns and the same n-k rows from A is called a k-th order principal submatrix of A
  2. Principal Minor: The determinant of a k \times k principal submatrix is called a k-th order principal minor of A
  3. Leading Principal Submatrix: Let A be a n \times n matrix. The k-th order principal submatrix of A obtained by deleting the last n-k rows and the last n-k columns from A is called the k-th order leading principal submatrix of A- denoted A_k.
  4. Leading Principal Minor: The determinant of the leading principal submatrix – denoted |A_k|.

A n \times n matrix has n leading principal submatrices – the top leftmost 1 \times 1 submatrix, top leftmost 2 \times 2 submatrix, and so on. 

Example 1: Consider a general 3 \times 3 matrix
$$A = \begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{bmatrix}$$
k = 1, n-k = 3 – 1 = 2 → Remove last 2 rows and columns A_1 = a_{11}, |A_1| = |a_{11}|
k = 2, n-k = 3 – 2 = 1 → Remove last 1 rows and columns A_2 = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}, |A_2| = \begin{vmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{vmatrix}
k = 3, n-k = 3 – 3 = 0 → Remove last 0 rows and columns A_3 = \begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{bmatrix}, |A_3| = \begin{vmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{vmatrix}

We can use leading principal minors to determine the definiteness of a given matrix.
Theorem 1: Let A be a n \times n symmetric matrix. Then, 
a. A is positive definite iff all its n leading principal minors are strictly positive
b. A is negative definite iff its n leading principal minors alternate in sign as follows:

  • |A_1| <0,\ |A_2| > 0, |A_3| < 0
  • The k-th order leading principal minor should have the same sign as (-1)^k

c. If some k-th order leading principal minor of A (or some pair of them) is nonzero but does not fit either of the above 2 sign patterns, then A is indefinite. This case occurs when A has a negative k-th order leading principal minor for an even integer k or when A has a negative k-th order leading principal minor and a positive l-th order leading principal minor for 2 distinct odd integers k and l. 

One way that the leading minor test of theorem 1 can fail for a given symmetric matrix A is that some leading principal minor of A is 0 while the nonzero ones fit the sign pattern in a) or b). When this occurs, the matrix A is not definite and it may or may not be semidefinite. In this case, to check for semidefiniteness, we must check the sign of every principal minor of A.

Relationship Between Definiteness and Principal Minors

There are some notable relationships between definite matrices and it’s principal minors:

Lemma 1: If A is a positive or negative definite matrix, then A is nonsingular, invertible (det(A) \neq 0)

Proof: By Contradiction

Suppose A is singular (non-invertible, det(A) = 0). Then there exists a nonzero vector \mathbf{x} such that A\mathbf{x} = \mathbf{0}. If this is true, we obtain \mathbf{x}^TA\mathbf{x} = \mathbf{x}^T(A\mathbf{x}) = \mathbf{x}^T \mathbf{0} = \mathbf{0} this contradicts the fact that A is positive or negative definite, \mathbf{x}^TA\mathbf{x} > 0 or \mathbf{x}^TA\mathbf{x} < 0. So A cannot be singular and positive or negative definite. If A is positive or negative definite, then A must be nonsingular ( there are only 2 options for A – singular or nonsingular). 

Lemma 2: Suppose A is a symmetric matrix, A = A^T and Q is a nonsingular matrix, det(Q) \neq 0.Then:

i) Q^TAQ is a symmetric matrix

ii) A is positive/ negative definite iff Q^TAQ is positive/negative definite.

Proof
i) (Q^TAQ)^T = (Q)^TA^T(Q^T)^T = Q^T A^T Q = Q^T A Q
ii) Suppose Q^TAQ is positive definite. Let \mathbf{x} \neq \mathbf{0} be an arbitrary nonzero vector in R^n. Since Q is nonsingular, there exists a nonzero vector \mathbf{y} such that \mathbf{x} = Q \mathbf{y} (because if it was singular (non invertible) then it’s inverse would be undefined – here we have Q^{-1}\mathbf{x} = (Q^{-1}Q)\mathbf{y} = \mathbf{y})

Then 
$$\mathbf{x}^TA\mathbf{x} = (Q\mathbf{y})^T A Q\mathbf{y}$$
$$\mathbf{x}^TA\mathbf{x} = \mathbf{y}^T Q^T A Q\mathbf{y}$$
$$\mathbf{x}^TA\mathbf{x} = \mathbf{y}^T (Q^TAQ) \mathbf{y}$$

Since we supposed Q^TAQ is positive definite (Q^TAQ > 0), we have
\mathbf{x}^TA\mathbf{x} = \mathbf{y}^T (Q^TAQ) \mathbf{y} > 0

Which implies that A is positive definite if we suppose Q^TAQ is positive definite. Conversely, suppose A is positive definite and let \mathbf{z} be an arbitrary nonzero vector, since Q is nonsingular, we can write

$$Q\mathbf{z} = \mathbf{x}$$$$\mathbf{x}^TA\mathbf{x} = (Q \mathbf{z})^T A Q\mathbf{z} > 0$$ 
$$\mathbf{x}^TA\mathbf{x} = \mathbf{z}^T Q^T A Q\mathbf{z} > 0$$
$$\mathbf{x}^TA\mathbf{x} = \mathbf{z}^T (Q^T A Q) \mathbf{z} > 0$$

this implies that Q^TAQ is positive definite if we suppose A is positive definite.  

Theorem 2: Let A be a symmetric matrix (A=A^T). Then, A is positive definite (\mathbf{x}^TA\mathbf{x} > 0) iff all its leading principal minors are positive.

Proof map:This proof is quite long so consider a proof map to help keep track. We have 2 objectives to establish with this theorem:

i) Objective 1: |A_j|>0,\ j=1,…,nA is positive definite

  • Suppose all leading principal submatrices are positive definite.
  • Since all leading principal submatrices are positive definite, they are nonsingular and invertible (by Lemma 1)
  • Partition the matrix A is a certain way. 
  • Express A = Q^TBQ and partition them in a similar way toA
  • Obtain an identity for the determinant of A via A = Q^TBQ and partition. This gives us a set of facts f_1
  • Multiply B in A = Q^TBQ by \mathbf{x}^T\mathbf{x} so we have \mathbf{x}^TB\mathbf{x}
  • Use the set of factsf_1 to show B is positive definite
  • By lemma 2, if B in A = Q^TBQ is positive definite, then A is positive definite 

ii) Objective 2: If A is positive definite → all |A_j| are positive

  • Suppose A is positive definite, \mathbf{x}^TA\mathbf{x}
  • Write \mathbf{x}^TA\mathbf{x} in terms of the principal submatrices A_j and this shows A_j is positive definite
  • Prove det(A) by using the decomposition A = Q^TBQ

Proof
To prove this theorem, we must establish 2 relationships:

i) If A is positive definite, then all its leading principal minors are positive

ii) If all the leading principal minors are positive, then A is positive definite

i): Let A be a (n+1) \times (n+1) positive definite matrix. To show that all A_j, j = 1,…, n+1 are positive definite, let \mathbf{x}_j be a nonzero j vector and let \mathbf{0}^* be the zero (n + 1) – j vector. Then

(1) \mathbf{x}^TA\mathbf{x}> 0
$$\mathbf{x}^TA\mathbf{x} = \begin{bmatrix} \mathbf{x}^T_j & \mathbf{0}^{*^T} \end{bmatrix}_{1 \times (n+1)} A_{(n + 1) \times (n+1)} \begin{bmatrix} \mathbf{x}^T_j \\ \mathbf{0}^{*^T} \end{bmatrix}_{(n \times 1) \times 1} > 0$$
(1) can be written as

(2) \mathbf{x}^TA\mathbf{x} = \begin{bmatrix} \mathbf{x}^T_j & \mathbf{0}^{*^T} \end{bmatrix}_{1 \times (n+1)} A_{(n + 1) \times (n+1)} \begin{bmatrix} \mathbf{x}^T_j \\ \mathbf{0}^{*^T} \end{bmatrix}_{(n \times 1) \times 1} = \mathbf{x}^T_j A_j \mathbf{x}_j > 0

This is because the zero elements in \begin{bmatrix} \mathbf{x}^T_j & \mathbf{0}^{*^T} \end{bmatrix} eliminate the (n+1) – j rows and (n+1) -j columns from A – this is equivalent to the j-th leading principal submatrix. (2) shows that each A_j is positive definite. 

If A_j is positive definite, then it is symmetric and has positive eigenvalues by definition. This implies that its determinant (principal minor) is positive.

ii): Let A be an (n + 1) \times (n + 1) symmetric matrix. Let A_j be the j \times j leading principal submatrix of A for j = 1,…,n+1. I.e. if j = n+1, then A_j = A.

Suppose the leading submatrices of A, A_j, are positive definite. By lemma 1, the A_j are nonsingular and invertible. 

Partition A as 
(3) A = \begin{bmatrix} a_{11} & a_{12} & … & a_{1n} & a_{1(n+1)} \\ a_{21} & a_{22} & … & a_{2n} & a_{2(n+1)} \\ …\\ a_{n1} & a_{n2} & … & a_{nn} & a_{n(n+1)} \\ a_{(n+1)1} & a_{(n+1)2} & … & a_{(n+1)n} & a_{(n+1)(n+1)} \end{bmatrix} = \begin{bmatrix} A_n & \mathbf{a} \\ \mathbf{a}^T & a_{(n+1)(n+1)} \end{bmatrix}

where \mathbf{a} = \begin{bmatrix} a_{1(n+1)} \\ a_{2(n+1)} \\ … \\ a_{n(n+1)} \end{bmatrix} is a n \times 1 column matrix. 

Let d = a_{(n \times 1)(n \times 1)} – \mathbf{a}^T(A_n)^{-1}\mathbf{a} and let \mathbf{0}_n be a n \times 1 column vector of all 0s. 

The identity (3) now can be written as 
(4) A = Q^TBQ = \begin{bmatrix} I_n & \mathbf{0}_n \\ (A_n^{-1}\mathbf{a})^T & 1 \end{bmatrix} \begin{bmatrix} A_n & \mathbf{0}_n \\ \mathbf{0}^T_n & d \end{bmatrix} \begin{bmatrix} I_n & A^{-1}_n \mathbf{a} \\ \mathbf{0}^T_n & 1 \end{bmatrix}

where the leftmost matrix is lower triangular and the rightmost matrix is upper triangular. 

Express det(A), use the properties of determinants
det(Q^T) = det(Q_n) = 1^n = 1Q is triangular and has diagonal entries of 1, so multiply the diagonal entries det(B) = d \times det(A_n) → Obtained by cofactor expansions across the bottom row – all that isn’t multiplied by 0 is d and the determinant of det(A_n).

(5) det(A) = det(Q^T) det(B) det(Q) = d \times det(A_n)

Since det(A) > 0 \rightarrow det(A_n) > 0\ \text{and}\ d > 0

Let \mathbf{x} be an arbitrary (n + 1) \times 1 vector. We can write \mathbf{x} as 
$$ \mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ … \\ x_n \\ x_{(n+1)} \end{bmatrix} = \begin{bmatrix} \mathbf{x}_n \\ x_{(n+1)} \end{bmatrix}$$

where \mathbf{x}_n is a n \times 1 vector

Apply to a matrix B

(6) \mathbf{x}^TB\mathbf{x} = \begin{bmatrix} \mathbf{x}_n & x_{(n+1)} \end{bmatrix}^T \begin{bmatrix} A_n & \mathbf{0}_n \\ \mathbf{0}^T_n & d \end{bmatrix} \begin{bmatrix} \mathbf{x}_n \\ x_{(n+1)} \end{bmatrix}

\mathbf{x}^T B \mathbf{x} = \begin{bmatrix} \mathbf{x}^T_n A_n & dx_{(n+1)} \end{bmatrix}\begin{bmatrix} \mathbf{x}_n \\ x_{(n+1)} \end{bmatrix}

\mathbf{x}^T B \mathbf{x} = \mathbf{x}^T_n A_n \mathbf{x}_n + dx^2_{(n+1)}

Since A_n is positive definite by supposition (\mathbf{x}_n^T A_n \mathbf{x}_n > 0) and we established that d > 0, it follows that \mathbf{x}^T_n A_n \mathbf{x}_n + dx^2_{(n+1)} > 0.

Therefore B is positive definite. Using lemma 2, if B is positive then Q^TBQ = A is also positive definite. 

Leave a Reply