In preparing some lecture notes for the implicit function theorem, I took a look at Schechter's delightfully comprehensive *Handbook of Analysis and its Foundations*^{1}, and I learned something new about the Banach fixed point theorem. To quote Schechter:

... although Banach's theorem is quite easy to prove, a longer proof cannot yield stronger results.

I will write a little bit here about a "converse" to the Banach theorem due to Bessaga, which uses a little bit of help from the Axiom of Choice.

## Banach's Theorem

Let us start by recalling the statement and proof of Banach's fixed point theorem.

*two*distinct fixed points $\xi, \zeta$ would give $d(f(\xi), f(\zeta)) = d(\xi, \zeta)$ contradicting the strict contraction property.

Note that one immediate corollary is that: if $f$ is a contraction mapping, then each iterate $f^{(k)} = \underbrace{f\circ f\circ \cdots \circ f}_{k \text{ times}}$ possesses exactly one (the same) fixed point $\xi$.

## Bessaga's converse

First we define an equivalence relation on $X$. We say $x\sim y$ if there exists positive integers $p,q$ such that $ f^{(p)}(x) = f^{(q)}(y)$. If $x\sim y$ we define \[ \rho(x,y) = \min \{p+q\mid f^{(p)}(x) = f^{(q)}(y)\} \] and \[ x\wedge y = f^{(p)}(x) \] where $p$ is the value that attains $\rho(x,y)$.

We also define \[ \sigma(x,y) = \rho(x,x\wedge y) - \rho(y,x\wedge y).\] Observe that $\rho$ is symmetric in its arguments and $\sigma$ antisymmetric.

(The idea so far: we can make $X$ a directed graph by making edges go from $x$ to $f(x)$. That $\xi$ is the only fixed point means that $X$ has no cycles, so $X$ now becomes a direct forest (like a tree, but not necessarily connected). The equivalence relation $\sim$ expresses whether two nodes are on the same connected component. When they are, $\rho$ measures the distance between them on the underlying tree, and $\sigma$ measures the difference between their distances to their most recent common ancestor.)

Now, by the axiom of choice, there exists a choice function that chooses for each equivalence class of $X/\sim$ a representative, this extends to a function $c:X\to X$. From this we can define a function $\lambda: X \to \mathbb{Z}$ by \[ \lambda(x) = \sigma(c(x),x). \]

(Life would be extremely simple if each of the connected component in our forest is rooted; unfortunately this is generally not the case. So we have to appeal to the axiom of choice to select a node. What we do below is compute the distance between two nodes on the same tree by using the selected node as a reference.)

Now we are in a position to define our distance function $d$. We do it in several steps.

- If $x\sim y$, we define \[ d(x,y) = 2^{-\lambda(x)} + 2^{-\lambda(y)} - 2\cdot 2^{-\lambda(x\wedge y)}\] where the empty sum yields 0.
- If $x\not\sim \xi$, we define \[ d(x,\xi) = 2^{- \lambda(x)}. \]
- If $x\not\sim y$ and neither $x,y$ is $\xi$, we define \[ d(x,y) = d(x,\xi) + d(y,\xi). \]

This definition is obviously symmetric and non-negative. And it is easy to check that $d(x,y) = 0 \implies x\sim y$ and $x = y = x\wedge y$. Triangle inequality involves a little bit more work, but most cases are immediately obvious except when $x\sim y\sim z$. Here we need to check that \[ 2\cdot 2^{-\lambda(y)} - 2 \cdot 2^{-\lambda(x\wedge y)} - 2\cdot 2^{-\lambda(y\wedge z)} \geq - 2 \cdot 2^{-\lambda(x\wedge z)}\] which we do right now. Suppose $f^{(p)}(x) = f^{(q_1)}(y)$ and $f^{(q_2)}(y) = f^{(r)}(z)$. WLOG we can take $q_1 \geq q_2$. Then we have that $f^{(p)}(x) = f^{(r + q_1 - q_2)}(z)$. This shows that $\lambda(x\wedge z) \leq \min( \lambda(x\wedge y), \lambda(y\wedge z))$. And this proves the inequality above.

It is easy to see that any Cauchy sequence either is eventually constant, or must converge to $\xi$: if $x\neq y$ we have that \[ d(x,y) \geq 2^{- \min(\lambda(x),\lambda(y)) - 1} \geq \frac14 (d(x,\xi) + d(y,\xi))\] and this shows that $d$ is a complete metric. Now, it remains to verify that $f$ is a contraction. Noting that $ \lambda(f(x)) = \lambda(x) + 1$ and $ f(x) \wedge f(y) = x\wedge y$ we see easily that $ f$ is a contraction with Lipschitz constant $\frac12$.

- A copy of Schechter's book can be found on his website
^{^}