The goal of this post is to introduce the Cayley-Menger determinant, and show how one can use it to prove a higher dimensional version of the Pythagorean theorem.

## A motivating example

The classical Pythagorean theorem states

This theorem was generalized by the French mathematician J. P. de Gua de Malves:

We will use Heron's formula in this proof. Heron's formula gives the area of a triangle in terms of its side lengths. A triangle with sides $a, b, c$ and semiperimeter $s = \frac12 (a+b+c)$ turns out to have area \[ A = \sqrt{s(s-a)(s-b)(s-c)}. \] In the context of De Gua's theorem, label the six legs of the tetrahedron as $l_1, l_2, l_3$ and $h_1, h_2, h_3$: here the $l_i$ are legs that emanates from the right-angle corner, and $h_i$ are the three sides to the hypotenuse face.

The areas of the three faces touching the right-angle corner are respectively \[ A_3 = \frac12 l_1\cdot l_2, \quad A_1 = \frac12 l_2\cdot l_3, \quad A_2 = \frac12 l_3 \cdot l_1. \] The area of the hypotenuse face is, using Heron's formula \[ H = \frac14 \sqrt{(h_1 + h_2 + h_3)(h_1 + h_2 - h_3)(h_2 + h_3 - h_1)(h_3 + h_1 - h_2)}. \] This we can expand via some algebra to get \[ H^2 = \frac1{16} \left( 2 h_1^2 h_3^2 + 2 h_2^2 h_3^2 + 2 h_1^2 h_2^2 - h_1^4 - h_2^4 - h_3^4 \right).\] Now, the lengths we can get using Pythagorean theorem \[ h_1^2 = l_2^2 + l_3^2, \quad h_2^2 = l_1^2 + l_3^2, \quad h_3^2 = l_1^2 + l_2^2.\] And so plugging things in and simplifying we get \[ H^2 = \frac1{4} \left( l_1^2 l_2^2 + l_3^2 l_2^2 + l_1^2 l_3^2\right) \] and the theorem is proved.

## N-simplices

An $N$ simplex is the generalization of a triangle (in a plane), and a tetrahedron (in space), to a figure in $N$-dimensional space. Start by considering $\mathbb{R}^N$, and choosing within it $(N+1)$ points in general position. This forms an $N$-volume enclosed by $(N+1)$ faces, each face being an $(N-1)$-simplex itself.

A special type of $N$ simplex is the *orthoscheme*. These are the simplices for which one can find a distinguished vertex (which we can label as $o$), such that the $N$ edges emanating from it are pairwise orthogonal. The right triangles are $2$-orthoschemes; the types of tetrahedra to which de Gua's theorem applies are $3$-orthoschemes. De Gua's Theorem and the Pythagorean Theorem turn out to be a special case of a theorem by Conant and Beyer^{1}

We will give a demonstration of this theorem after introducing the Cayley-Menger determinant.

## Cayley-Menger determinant

The Cayley-Menger determinant formula is the higher dimensional generalization of Heron's formula. Given an $N$-simplex, enumerate the vertices $v_0, \ldots, v_N\in \mathbb{R}^N$. Let $B$ denote the $(N+1)\times(N+1)$ matrix whose entries $B_{ij} = \| v_i - v_j\|^2$, where $i,j$ runs from $0,\ldots, N$. The $\|-\|$ denotes the Euclidean distance.

Consider the matrix $\tilde{B} = \begin{pmatrix} 0 & 1 \newline 1 & B \end{pmatrix}$. The Cayley-Menger determinant formula states \begin{equation}\label{eq:CM}\tag{CM} \bigl(\text{$N$-dimensional volume of simplex}\bigr)^2 = \frac{1}{2^N(N!)^2} \bigl|\det(\tilde{B})\bigr|. \end{equation}

The power of the Cayley-Menger formula is that it only requires as input the relative distances between the vertices, and not the angles. That this is possible should not be too surprising, since just as in the case of plane geometry where the lengths of the legs uniquely determine the triangle, the lengths of the edges of a simplex also uniquely determine the simplex (up to ambient isometry).

The remainder of this section is devoted to the proof of \eqref{eq:CM}.

First, notice that if we know more than just the relative distances, there is a simpler way of finding the volume of a simplex: for $i$ running between $1, \ldots, N$, define $u_i = v_i - v_0$. In other words, we can without loss of generality assume that the vertex $v_0$ sits at the origin. Let $A$ be the $N\times N$ matrix whose column vectors are $u_1, \ldots, u_N$, then it is well-known that our simplex has $N$-dimensional volume \[ \text{Volume} = \frac{1}{N!} |\det A|.\]

Returning to the matrix $B$, we see that its entries can be written as \[ B_{ij} = (u_i - u_j) \cdot (u_i - u_j) = \|u_i\|^2 + \|u_j\|^2 - 2 u_i \cdot u_j. \] We can alternatively express this in the form of a dot product: construct the $N+2$ dimensional vectors \begin{equation} \tilde{u}_i = \begin{pmatrix} 1 \newline u_i \newline \|u_i\|^2 \end{pmatrix} \qquad \hat{u}_i = \begin{pmatrix} \| u_i\|^2 \newline -2 u_i \newline 1 \end{pmatrix} \end{equation} we see that \begin{equation} B_{ij} = \tilde{u}_i \cdot \hat{u}_{j}. \end{equation} Note that here $i$ and $j$ run from $0$ to $N$.

The next inspired step comes from observing that the family $\{ \tilde{u}_i \}$ and $\{\hat{u}_i\}$ are each $N+1$ vectors in an $N+2$ dimensional space. We would like to use these two families to get at the volume of our simplex, which would involve taking some sort of determinant. But to do so we would need a full basis, so what do we choose as the final vector?

Let's suppose we were to want to construct a square matrix $\tilde{A}$ by adding one extra column to the $(N+2) \times (N+1)$ matrix formed by setting its columns to be $\tilde{u}_i$. This matrix would look something like, using our definition, \[ \tilde{A} = \begin{pmatrix} ? & 1 & 1 & \cdots & 1 \newline ? & u_0 & u_1 & \cdots & u_N \newline ? & \|u_0\|^2 & \|u_1\|^2 & \cdots & \|u_N\|^2 \end{pmatrix}. \] Note that $u_0$ is the $0$ vector by construction, and $\|u_0\| = 0$. This means we can also write \[ \tilde{A} = \begin{pmatrix} ? & 1 & 1 & \cdots & 1 \newline ? & 0 \newline ? & \vdots & & A \newline ? & 0 \newline ? & 0 & \|u_1\|^2 & \cdots & \|u_N\|^2 \end{pmatrix}. \] So if we were to fill the first column with the vector $(0,0, \ldots,0, 1)$, the resulting square matrix will obviously have \[ |\det(\tilde{A})| = |\det(A)|.\]

Following this line of argument, let \begin{equation} \tilde{u}_{-1} = \begin{pmatrix} 0 \newline \vdots \newline 0 \newline 1 \end{pmatrix} ,\qquad \hat{u}_{-1} = \begin{pmatrix} 1 \newline 0 \newline \vdots \newline 0 \end{pmatrix}, \end{equation} and define \begin{equation} \tilde{A} = \begin{pmatrix} \tilde{u}_{-1} & \tilde{u}_{0} & \cdots & \tilde{u}_N \end{pmatrix}, \qquad \hat{A} = \begin{pmatrix} \hat{u}_{-1} & \hat{u}_{0} & \cdots & \hat{u}_N \end{pmatrix}. \end{equation} These two matrices are both square (dimension $(N+2)\times (N+2)$), with the property \[ \frac{1}{2^N}| \det(\hat{A}) | = |\det(A)| = |\det(\tilde{A})|. \]

Now, using the properties of matrix multiplication, we see that the square matrix formed by $\tilde{A}^T \hat{A}$ has its $(i,j)$th entry given by exactly $\tilde{u}_i \cdot \hat{u}_j$. And a direct computation shows that \begin{equation} \tilde{A}^T \hat{A} = \tilde{B} = \hat{A}^T \tilde{A}. \end{equation} And therefore we must have \begin{equation} |\det \tilde{B}| = 2^N |\det A|^2 \end{equation} and \eqref{eq:CM} follows immediately.

## Proof of the Conant-Beyer Theorem

Without loss of generality, we can assume that the distinguish point $o$ is the origin in $\mathbb{R}^N$. Furthermore, we can rotate the orthoscheme so that the remaining $N$ vertices sit on the coordinate axes, with lengths $\ell_1, \ldots, \ell_N$.

This means that the volumes $A_1, \ldots, A_N$ can be expressed as \begin{equation} A_i = \frac{1}{(N-1)!} \cdot \frac{\prod_{j = 1}^N \ell_j}{\ell_i}. \end{equation}

How about $H$? For this we will use \eqref{eq:CM}. The matrix $B$ of pairwise distances is now the $N\times N$ matrix which equals
\[ B_{ij} = \begin{cases}
0 & i = j\newline
\ell_i^2 + \ell_j^2 & i \neq j \end{cases} \]
Let $\mu$ denote the $N+1$ dimensional column vector $(0,1, \ldots, 1)$. It transpires that we can write the matrix $\tilde{B}$ in \eqref{eq:CM} as
\begin{equation}
\tilde{B} = \underbrace{\begin{pmatrix} 0 & 1 & \cdots & 1 \newline
1 & -2 \ell_1^2 & & 0 \newline
\vdots & & \ddots \newline
1 & 0 & & -2 \ell_N^2 \end{pmatrix}}_{K} + \underbrace{\begin{pmatrix} 0 \newline \ell_1^2 \mu^T \newline \vdots \newline \ell_N^2 \mu^T \end{pmatrix}}_{J} + \underbrace{\begin{pmatrix} 0 & \ell_1^2 \mu & \cdots & \ell_N^2 \mu \end{pmatrix}}_{J^T}.
\end{equation}
Now, each of the rows of $J$ is linearly dependent on $\mu^T$, which happens to be the *first row of $K$*; similarly, each column of $J^T$ is linearly dependent on $\mu$, which happens to be the *first column of $K + J$*. This means by elementary properties of the determinant that
\[ \det(\tilde{B}) = \det(K).\]
This determinant is fairly easy to compute, and using that
\[ H^2 = \frac{1}{2^{N-1} ((N-1)!)^2} \det(\tilde{B}) \]
we can check indeed that
\[ H^2 = \sum_{j = 1}^N (A_j)^2 \]
as desired.

## Postscript

I have some sentimental feelings about this result: this is the first theorem that I discovered on my own steam (as in: not a problem set to me by someone else, but something that I conjectured and proved from the ground up).
This happened in 2002, after my freshman year at Princeton. Spring semester that year I took MAT217 Linear Algebra; the instructor was Ed Nelson.^{2}
I no longer remember what motivated me to even conjecture this theorem: likely I was reading some popular science or math book and came across De Gua's theorem and then asked myself whether it can be further generalized.^{3}
This happened during the Summer, when I was working at a summer REU at Argonne National Laboratory^{4}, which has fairly decent library. I worked this out during the downtime at the evenings^{5}.
I remember being pretty excited to get everything to work out, and so I typed up my findings^{6} on July 16, 2002 and emailed a copy to Ed Nelson.
Nelson sent me a PDF download from MathSciNet, **not** to the AMM article of Conant and Beyer (from 1974), but a another article from the mid 1990s by two Chinese fellas who rediscovered this result. Interestingly, the MathSciNet review of the later article did not acknowledge the prior result of Conant and Beyer^{7}; I think this oversight may have had some positive impact on my eventually deciding to become a mathematician.
While it is somewhat disappointing that the result I worked hard on was not *new*, I somehow found solace in knowing (incorrectly when I now look back with the benefit of modern internet) that I was *only too late by about a decade*.
This gave me hope that the frontiers of mathematics is actually not that far away, and if I just know where to look I could potentially make a splash.

- Donald R Conant & William A Beyer (Mar 1974). "Generalized Pythagorean Theorem".
*The American Mathematical Monthly.*Link^{^} - I don't think I ever truly appreciated, when I was an undergrad, how at Princeton so many of the lower level undergraduate courses were taught by world-class researchers. Of course, considering what it takes to become a faculty at the Princeton math department, this shouldn't be that much of a surprise. (As a related aside: the first time I met Tim Gowers was first day of classes, Freshman year, when he showed up to cover MAT215 Analysis as a substitute teacher for the instructor of record who got stuck in Florida due to a hurricane. I was very amused to find out from classmates (who did the IMO and are more in tune with the who's who of mathematics) who this guy is, and even more amused to find out two weeks later at the first practice of the Jazz Ensemble that he auditioned for and was selected to play piano in PUJE2.)
^{^} - One should remember this is around May / June of 2002. Wikipedia has existed for less than 18 months; it was not yet the indispensable resource for quick literature searches in elementary (and not-so-elementary mathematics). Nowadays checking the Wikipedia entry for De Gua's theorem immediately leads you to the statement of the Conant-Beyer theorem.
^{^} - Since I am doing foot note shout-outs: this was at Linda Young's Lab through the US Department of Energy's "Energy Research Undergraduate Laboratory Fellow" program.
^{^} - In 2002 I was still very introverted, and didn't really "hang out" with the other students. In 2003 I worked again at Argonne, this second time the evenings were much less "productive" academically, as one of my suite-mates brought an X-box with a mod-chip, and many evenings were wasted on watching DVDs or playing Soul Caliber.
^{^} - I still have a PDF copy. The presentation above is loosely based on what I wrote 18 years ago. No, I will not share that PDF: there are some rather cringe-worthy inefficiencies that came from a much less mathematically mature mind.
^{^} - I don't think, to this day, the Conant-Beyer theorem is common knowledge. Certainly Ed Nelson wasn't aware of it, else he would have pointed me to the earlier antecedent. Likely the reviewer for MathSciNet also wasn't.
^{^}