Bessaga's converse to the contraction mapping theorem

In preparing some lecture notes for the implicit function theorem, I took a look at Schechter's delightfully comprehensive Handbook of Analysis and its Foundations1, and I learned something new about the Banach fixed point theorem. To quote Schechter:

... although Banach's theorem is quite easy to prove, a longer proof cannot yield stronger results.

I will write a little bit here about a "converse" to the Banach theorem due to Bessaga, which uses a little bit of help from the Axiom of Choice.

Banach's Theorem

Let us start by recalling the statement and proof of Banach's fixed point theorem.

Theorem    [Banach fixed point]
Let $(X,d)$ be a complete non-empty metric space. Let $f:X \to X$ be a strict contraction: that is to say, suppose there exists some $\alpha \in [0,1)$ such that $ d(f(x_1),f(x_2)) \leq \alpha d(x_1,x_2)$ always hold. Then $ f$ has a unique fixed point $\xi$.
Proof:
Let $x_0\in X$ be arbitrary and fixed. Consider the sequence generated by $x_n = f(x_{n-1})$ for $n \geq 1$. Define $d_0 = d(x_0, x_1)$. The sequence $x_0, x_1, \ldots$ is Cauchy, as, by triangle inequality, for any $m, n > 0$ we have \[ d(x_m, x_{m+n}) \leq d(x_m, x_{m+1}) + d(x_{m+1}, x_{m+2}) + \cdots + d(x_{m+n-1}, x_{m+n}) \] which by the contraction mapping assumption can be bounded by \[ \leq d_0 ( \alpha^m + \alpha^{m+1} + \cdots + \alpha^{m+n-1} \leq d_0 \cdot \frac{\alpha^m}{1 - \alpha}.\] The completeness of $X$ implies then the sequence converges to some element $\xi$. As \[ d(\xi, f(\xi)) \leq d(\xi, x_m) + d(x_m, f(\xi)) \leq d(\xi, x_m) + \alpha d(x_{m-1}, \xi) \] the convergence of the sequence implies $\xi$ is a fixed point. This fixed point is unique as the presence of two distinct fixed points $\xi, \zeta$ would give $d(f(\xi), f(\zeta)) = d(\xi, \zeta)$ contradicting the strict contraction property.

Note that one immediate corollary is that: if $f$ is a contraction mapping, then each iterate $f^{(k)} = \underbrace{f\circ f\circ \cdots \circ f}_{k \text{ times}}$ possesses exactly one (the same) fixed point $\xi$.

Bessaga's converse

Theorem    [Bessaga]
Let $X$ be a set and $\xi\in X$. Suppose $f:X\to X$ is a function such that for any $k$ we have \[ f^{(k)}(x) = x \iff x = \xi \] then there exists a complete metric $d$ on $X$ such that $f$ is a strict contraction.
Proof [sketch]:

First we define an equivalence relation on $X$. We say $x\sim y$ if there exists positive integers $p,q$ such that $ f^{(p)}(x) = f^{(q)}(y)$. If $x\sim y$ we define \[ \rho(x,y) = \min \{p+q\mid f^{(p)}(x) = f^{(q)}(y)\} \] and \[ x\wedge y = f^{(p)}(x) \] where $p$ is the value that attains $\rho(x,y)$.

We also define \[ \sigma(x,y) = \rho(x,x\wedge y) - \rho(y,x\wedge y).\] Observe that $\rho$ is symmetric in its arguments and $\sigma$ antisymmetric.

(The idea so far: we can make $X$ a directed graph by making edges go from $x$ to $f(x)$. That $\xi$ is the only fixed point means that $X$ has no cycles, so $X$ now becomes a direct forest (like a tree, but not necessarily connected). The equivalence relation $\sim$ expresses whether two nodes are on the same connected component. When they are, $\rho$ measures the distance between them on the underlying tree, and $\sigma$ measures the difference between their distances to their most recent common ancestor.)

Now, by the axiom of choice, there exists a choice function that chooses for each equivalence class of $X/\sim$ a representative, this extends to a function $c:X\to X$. From this we can define a function $\lambda: X \to \mathbb{Z}$ by \[ \lambda(x) = \sigma(c(x),x). \]

(Life would be extremely simple if each of the connected component in our forest is rooted; unfortunately this is generally not the case. So we have to appeal to the axiom of choice to select a node. What we do below is compute the distance between two nodes on the same tree by using the selected node as a reference.)

Now we are in a position to define our distance function $d$. We do it in several steps.

  1. If $x\sim y$, we define \[ d(x,y) = 2^{-\lambda(x)} + 2^{-\lambda(y)} - 2\cdot 2^{-\lambda(x\wedge y)}\] where the empty sum yields 0.
  2. If $x\not\sim \xi$, we define \[ d(x,\xi) = 2^{- \lambda(x)}. \]
  3. If $x\not\sim y$ and neither $x,y$ is $\xi$, we define \[ d(x,y) = d(x,\xi) + d(y,\xi). \]

This definition is obviously symmetric and non-negative. And it is easy to check that $d(x,y) = 0 \implies x\sim y$ and $x = y = x\wedge y$. Triangle inequality involves a little bit more work, but most cases are immediately obvious except when $x\sim y\sim z$. Here we need to check that \[ 2\cdot 2^{-\lambda(y)} - 2 \cdot 2^{-\lambda(x\wedge y)} - 2\cdot 2^{-\lambda(y\wedge z)} \geq - 2 \cdot 2^{-\lambda(x\wedge z)}\] which we do right now. Suppose $f^{(p)}(x) = f^{(q_1)}(y)$ and $f^{(q_2)}(y) = f^{(r)}(z)$. WLOG we can take $q_1 \geq q_2$. Then we have that $f^{(p)}(x) = f^{(r + q_1 - q_2)}(z)$. This shows that $\lambda(x\wedge z) \leq \min( \lambda(x\wedge y), \lambda(y\wedge z))$. And this proves the inequality above.

It is easy to see that any Cauchy sequence either is eventually constant, or must converge to $\xi$: if $x\neq y$ we have that \[ d(x,y) \geq 2^{- \min(\lambda(x),\lambda(y)) - 1} \geq \frac14 (d(x,\xi) + d(y,\xi))\] and this shows that $d$ is a complete metric. Now, it remains to verify that $f$ is a contraction. Noting that $ \lambda(f(x)) = \lambda(x) + 1$ and $ f(x) \wedge f(y) = x\wedge y$ we see easily that $ f$ is a contraction with Lipschitz constant $\frac12$.


  1. A copy of Schechter's book can be found on his website ^
Avatar
Willie WY Wong
Assistant Professor

My research interests include partial differential equations, geometric analysis, fluid dynamics, and general relativity.

Related