Diagonalisation of symmetric and Hermitian matrices

In this last section we discuss properties of symmetric and Hermitian matrices which have applications in classical physics and quantum mechanics. The theory outlined for Hermitian matrices is closely related to Sturm-Liouville theory which defines properties of solution sets in boundary value problems. The latter arise in the separation of variables procedure for partial differential equations which are covered in the Year II Mathematics course in the context of heat and mass transfer.

Recall that a matrix is symmetric iff it is equal to its transpose, i.e.

A=A^{\top} .

Definition of Hermitian conjugate

If $A$ is a matrix with complex entries, the Hermitian conjugate of $A$ , denoted by $A^{\dagger}$ is the complex conjugate of all entries in the matrix transposed, i.e.

\left(A^{\dagger}\right)_{i j}=\bar{a}_{j i}

A complex matrix $A$ is called Hermitian if

A^{\dagger}=A

In particular, all symmetric matrices are Hermitian. We also note that the Hermitian conjugate of the product of two matrices is the product of their conjugates taken in reverse order

(A B)^{\dagger}=B^{\dagger} A^{\dagger}

which follows from the same identity for transpose matrices and the fact that the complex conjugate of a product is the product of complex conjugates, i.e.

\bar{\omega} \bar{z}=\overline{\omega z} .

The diagonalisation of symmetric matrices is particularly easy as all eigenvalues are real and corresponding eigenvectors are orthonormal (these are mutually orthogonal vectors of unit length). We recall here that the dot product of two real vectors can be written using column vector notation as,

\boldsymbol{e}_{1} \cdot \boldsymbol{e}_{2}=\boldsymbol{e}_{1}^{\top} \boldsymbol{e}_{2}

For complex-valued vectors the generalisation of the dot product involves the Hermitian conjugate. For two complex-valued vectors $\boldsymbol{u}$ and $\boldsymbol{v}$ , the dot product is defined as,

\boldsymbol{u} \cdot \boldsymbol{v}=\boldsymbol{u}^{\dagger} \boldsymbol{v}

Hermitian matrices are orthogonally diagonalisable

While the proofs given here are beyond the scope of the course, the results are important. The key message is that Hermitian and therefore symmetric matrices are orthogonally diagonalisable.

Theorem 10.1 For a symmetric matrix $A$ , we may diagonalise $A$ with respect to an orthonormal set of eigenvectors.

The first step is to understand that the eigenvalues and therefore the eigenvectors of a Hermitian matrix are real.

Theorem 10.2 For an $n \times n$ Hermitian matrix $A$ all eigenvalues are real.

Proof Consider an eigenvector $\boldsymbol{x}$ with a possibly complex eigenvalue, $\lambda$ . We want to show that $\lambda$ is real which follows from $\lambda=\bar{\lambda}$ . Starting from,

A \boldsymbol{x}=\lambda \boldsymbol{x},

and multiplying by the Hermitian conjugate of $\boldsymbol{x}$ ,

\boldsymbol{x}^{\dagger} A \boldsymbol{x}=\lambda \boldsymbol{x}^{\dagger} \boldsymbol{x} .

Next, we take the complex conjugate on both sides,

\left(\boldsymbol{x}^{\dagger} A \boldsymbol{x}\right)^{\dagger}=\bar{\lambda}\left(\boldsymbol{x}^{\dagger} \boldsymbol{x}\right)^{\dagger}

Using the property (10.126), (10.130) becomes,

\boldsymbol{x}^{\dagger} A^{\dagger} \boldsymbol{x}=\bar{\lambda} \boldsymbol{x}^{\dagger} \boldsymbol{x}

Since $A$ is Hermitian, then $A^{\dagger}=A$ which renders (10.131) as,

\boldsymbol{x}^{\dagger} A \boldsymbol{x}=\bar{\lambda} \boldsymbol{x}^{\dagger} \boldsymbol{x}

Finally, subtracting Eq. (10.132) from (10.129) gives,

\begin{aligned} (\lambda-\bar{\lambda}) \boldsymbol{x}^{\dagger} \boldsymbol{x} & =\boldsymbol{x}^{\dagger} A \boldsymbol{x}-\boldsymbol{x}^{\dagger} A \boldsymbol{x} \\ & =0 . \end{aligned}

Since $\boldsymbol{x}$ is an eigenvector (and therefore nonzero), $\boldsymbol{x}^{\dagger} \boldsymbol{x} \neq 0$ which implies that $\lambda=\bar{\lambda}$ and the eigenvalues are real.

The second step is to show that eigenvectors corresponding to distinct eigenvalues are orthogonal. Theorem 10.3 Eigenvectors of Hermitian matrices corresponding to different eigenvalues are orthogonal.

Proof Consider two eigenvectors $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$ of a Hermitian matrix $A$ , corresponding to eigenvalues $\lambda_{1}$ and $\lambda_{2}$ where $\lambda_{1} \neq \lambda_{2}$ . We want to show that $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$ are orthogonal. We start from,

A \boldsymbol{x}_{1}=\lambda_{1} \boldsymbol{x}_{1}, \quad A \boldsymbol{x}_{2}=\lambda_{2} \boldsymbol{x}_{2} .

We multiply the first equation by $\boldsymbol{x}_{2}^{\dagger}$

\boldsymbol{x}_{2}^{\dagger} A \boldsymbol{x}_{1}=\lambda_{1} \boldsymbol{x}_{2}^{\dagger} \boldsymbol{x}_{1}

and the second by $\boldsymbol{x}_{1}^{\dagger}$

\boldsymbol{x}_{1}^{\dagger} A \boldsymbol{x}_{2}=\lambda_{2} \boldsymbol{x}_{1}^{\dagger} \boldsymbol{x}_{2}

We take the Hermitian conjugate on Eq. (10.135),

\left(\boldsymbol{x}_{1}^{\dagger} A \boldsymbol{x}_{2}\right)^{\dagger}=\overline{\lambda_{2}}\left(\boldsymbol{x}_{1}^{\dagger} \boldsymbol{x}_{2}\right)^{\dagger}

With similar arguments as in Theorem 10.2, we manipulate Eq. (10.136) as,

\boldsymbol{x}_{2}^{\dagger} A \boldsymbol{x}_{1}=\overline{\lambda_{2}} \boldsymbol{x}_{2}^{\dagger} \boldsymbol{x}_{1}

Finally, subtracting Eq. (10.137) from (10.134) gives,

\begin{aligned} \left(\lambda_{1}-\overline{\lambda_{2}}\right) \boldsymbol{x}_{2}^{\dagger} \boldsymbol{x}_{1} & =\boldsymbol{x}_{2}^{\dagger} A \boldsymbol{x}_{1}-\boldsymbol{x}_{2}^{\dagger} A \boldsymbol{x}_{1} \\ & =0 \end{aligned}

Note that from Theorem 10.2, we know that $\overline{\lambda_{2}}=\lambda_{2}$ since the eigenvalues of a Hermitian matrix are real. Further, since $\lambda_{1} \neq \lambda_{2}$ then $\boldsymbol{x}_{2}^{\dagger} \boldsymbol{x}_{1}=0$ which implies orthogonality.

The only question left to ask is - how can we prove that we have orthogonal eigenvectors for eigenvectors with the same eigenvalue? The first step towards answering this question requires deeper knowledge than is presented in this course and we assume that for a given eigenvalue we have a full set of eigenvectors. Once this is assumed, we can easily show that any set of eigenvectors can be turned into an orthogonal set of eigenvectors. This is done through the Gram-Schmidt process. Example 10.13 Let $A$ be

A=\left(\begin{array}{ccc} 3 / 2 & -1 / 2 & 0 \\ -1 / 2 & 3 / 2 & 0 \\ 0 & 0 & 3 \end{array}\right)

Diagonalise the symmetric matrix $A$ with respect to its orthonormal eigenvectors.

Solution We first compute the eigenvalues from,

\begin{aligned} \left|\begin{array}{ccc} 3 / 2-\lambda & -1 / 2 & 0 \\ -1 / 2 & 3 / 2-\lambda & 0 \\ 0 & 0 & 3-\lambda \end{array}\right| & =(3 / 2-\lambda)\left|\begin{array}{cc} 3 / 2-\lambda & 0 \\ 0 & 3-\lambda \end{array}\right|+\frac{1}{2}\left|\begin{array}{cc} -1 / 2 & 0 \\ 0 & 3-\lambda \end{array}\right| \\ & =(3-\lambda)(\lambda-2)(\lambda-1) ; \end{aligned}

hence the eigenvalues are 1, 2, and 3. Next, we compute the eigenvectors. Starting with $\lambda=1$ , we need to solve

A=\left(\begin{array}{ccc} 1 / 2 & -1 / 2 & 0 \\ -1 / 2 & 1 / 2 & 0 \\ 0 & 0 & 2 \end{array}\right)\left(\begin{array}{l} x \\ y \\ z \end{array}\right)=\left(\begin{array}{l} 0 \\ 0 \\ 0 \end{array}\right)

which has solution $x=y$ and $z=0$ and so the eigenvector is $\boldsymbol{x}_{1}=(1,1,0)^{\top}$ ; the unit vector is

\boldsymbol{e}_{1}=\left(\begin{array}{c} 1 / \sqrt{2} \\ 1 / \sqrt{2} \\ 0 \end{array}\right)

Similarly, we find the other two unit vectors corresponding to $\lambda=2$ and $\lambda=3$ , respectively, as,

\boldsymbol{e}_{2}=\left(\begin{array}{c} 1 / \sqrt{2} \\ -1 / \sqrt{2} \\ 0 \end{array}\right), \quad \boldsymbol{e}_{3}=\left(\begin{array}{l} 0 \\ 0 \\ 1 \end{array}\right)

Recall that the diagonalisation matrix is given by $A=P D P^{-1}$ , where $P$ is made up of the three eigenvectors as follows,

P=\left(\begin{array}{ccc} 1 / \sqrt{2} & 1 / \sqrt{2} & 0 \\ 1 / \sqrt{2} & -1 / \sqrt{2} & 0 \\ 0 & 0 & 1 \end{array}\right)

Since $P$ is a matrix with orthonormal columns then it is orthogonal which we know from Subsec. 10.1.4 to satisfy $P^{-1}=P^{\top}$ . Further, $D$ is the diagonal matrix with its diagonal entries made up of the eigenvalues of $A$ . Hence, we have

A=\underbrace{\left(\begin{array}{ccc} 1 / \sqrt{2} & 1 / \sqrt{2} & 0 \\ 1 / \sqrt{2} & -1 / \sqrt{2} & 0 \\ 0 & 0 & 1 \end{array}\right)}_{P} \underbrace{\left(\begin{array}{ccc} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 3 \end{array}\right)}_{D} \underbrace{\left(\begin{array}{ccc} 1 / \sqrt{2} & 1 / \sqrt{2} & 0 \\ 1 / \sqrt{2} & -1 / \sqrt{2} & 0 \\ 0 & 0 \end{array}\right)}_{P^{-1}}

The biggest advantage of symmetric matrices is that we do not need to find the inverse of the matrix of eigenvectors using Gauss-Jordan, but we can just take the transpose.

Exercises

Using the Cayley-Hamilton theorem find the inverse of

(a) $A=\left(\begin{array}{ccc}2 & -1 & 1 \\ -3 / 2 & 5 / 2 & 3 / 2 \\ -1 / 2 & 1 / 2 & 7 / 2\end{array}\right)$ ;

(b) $A=\left(\begin{array}{ccc}1 / 3 & -8 / 3 & 4 / 3 \\ -1 & -2 & 2 \\ -4 / 3 & -10 / 3 / 2 & 11 / 3\end{array}\right)$ .
Orthogonally diagonalise the following matrices:

(a) $A=\left(\begin{array}{lll}0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 3\end{array}\right)$ ;

(b) $A=\left(\begin{array}{ccc}2 & 1 & -1 \\ 1 & 0 & 1 \\ -1 & 1 & 2\end{array}\right)$ .

The Cayley-Hamilton theorem Differential equations