Typically Young's inequality is stated for convolutions. We can however regard it as a special case of the following inequality concerning integral kernels.
Setting $k(x,y) = g(x-y)$ immediately yields the classical convolution inequality:
Proof by interpolation
The way that I usually teach Young's inequality is via complex interpolation. First note that for \eqref{eq:3} to hold with $p,q,r\in [1,\infty]$, necessarily we have $p\in [1, r']$ and $q\in [r,\infty]$, where $r'$ is the Hölder conjugate \[ \frac{1}{r'} = 1 - \frac{1}{r} .\] For convenience, denote by $T$ the operator $f \mapsto \int_{\mathbb{R}^n} k(\cdot,y) f(y) ~\mathrm{d}y$. The endpoints of the estimates are easy to obtain:
- When $p = r'$, then we must have $q = \infty$. In this case we want to estimate $\sup |Tf|$. This is very straightforward by Minkowski's inequality and Hölder's inequality \[ \begin{aligned} |Tf(x)| & \leq \int_{\mathbb{R}^n} |k(x,y)| |f(y)| ~\mathrm{d}y \newline & \leq \| k(x,\cdot) \|_{L^{r}(\mathbb{R}^n)} \|f\|_{L^{r'}(\mathbb{R}^n)} \end{aligned} \] Taking the supremum on both sides we have the operator norm bound \[ \|T \|_{L^{r'} \to L^\infty} \leq C_x.\]
- When $p = 1$, then we must have $q = r$. Again by Minkowski's inequality we have \[ \begin{aligned} \| Tf \|_{L^r(\mathbb{R}^m)} & \leq \int_{\mathbb{R}^n} \|k(\cdot,y) \|_{L^r(\mathbb{R}^m)} |f(y)| ~\mathrm{d}y \newline & \leq C_y \|f\|_{L^1(\mathbb{R}^n)}. \end{aligned} \] And so we have the operator norm bound \[ \|T \|_{L^{1} \to L^r} \leq C_y.\]
So by the Riesz-Thorin(-Stein) Theorem (which can be proven using Complex Interpolation), for any $p,q,r$ satisfying \eqref{eq:3}, we have
\[ \| T\|_{L^p \to L^q} \leq C_x^{1 - \frac{r}{q}} C_y^{\frac{r}{q}} .\]
(Here we used $\theta \in [0,1]$ with $p^{-1} = 1 - \theta r^{-1}$, and $q^{-1} = (1-\theta) r^{-1}$.)
Elementary proof
I learned about the following proof through a post by Daniele Tampieri on MathOverflow, the original proof concerns the convolution operator and was due to Besov, Il’in, and Nikol’skiĭ. The proof however easily generalizes (as shown below) to the integral kernel case.
The proof is based on the following decomposition \begin{equation} f(y) k(x,y) = \left[ f(y)^p k(x,y)^r\right]^{1/q} f(y)^{1 - p/q} k(x,y)^{1 - r/q}. \end{equation} We integrate both sides in $y$, and apply Hölder's inequality to the right. Here we use that as a consequence of \eqref{eq:3}, the following identity holds for the exponents \begin{equation} q^{-1} + \left( \frac{p}{1 - p/q} \right)^{-1} + \left( \frac{r}{1 - r/q} \right)^{-1} = \frac{1}{q} + \frac{1}{p} - \frac{1}{q} + \frac{1}{r} - \frac{1}{q} = 1. \end{equation} Therefore we obtain \begin{equation} \int_{\mathbb{R}^n} k(x,y) f(y) ~\mathrm{d}y \leq \Bigl( \int_{\mathbb{R}^n} k(x,y)^r f(y)^p ~\mathrm{d}y \Bigr)^{\frac{1}{q}} \cdot \| f\|_{L^p(\mathbb{R}^n)}^{1 - \frac{p}{q}} \| k(x, \cdot) \|_{L^r(\mathbb{R}^n)}^{1 - \frac{r}{q}}. \end{equation} Now take the $L^q$ norm of both sides. On the right, the middle factor is independent of $x$ and comes straight out. We put the final factor in $L^\infty(\mathbb{R}^m)$ and the first factor in $L^q(\mathbb{R}^m)$. This gives \begin{multline}\label{eq:almost} \Bigl \| \int_{\mathbb{R}^n} k(x,y) f(y) ~\mathrm{d}y \Bigr\|_{L^q(\mathbb{R}^m)} \leq \newline \Bigl( \int_{\mathbb{R}^m} \int_{\mathbb{R}^n} k(x,y)^r f(y)^p ~\mathrm{d}y ~\mathrm{d}x \Bigr)^{\frac{1}{q}} \cdot \| f\|_{L^p(\mathbb{R}^n)}^{1 - \frac{p}{q}} \sup_{x\in \mathbb{R}^m} \| k(x, \cdot) \|_{L^r(\mathbb{R}^n)}^{1 - \frac{r}{q}}. \qquad \end{multline} For the double integral on the right we apply Fubini and swap the order of integration, which by Minkowski's and Hölder's inequalities implies \[ \int_{\mathbb{R}^n} \int_{\mathbb{R}^m} k(x,y)^r f(y)^p ~\mathrm{d}x ~\mathrm{d}y \leq \sup_{y\in \mathbb{R}^n} \| k(\cdot,y)\|_{L^r(\mathbb{R}^m)}^r \cdot \|f\|_{L^p(\mathbb{R}^n)}^p. \] Plugging this into \eqref{eq:almost} we conclude \[ \Bigl \| \int_{\mathbb{R}^n} k(x,y) f(y) ~\mathrm{d}y \Bigr\|_{L^q(\mathbb{R}^m)} \leq \| f\|_{L^p(\mathbb{R}^n)} C_x^{1-r/q} C_y^{r/q} \] exactly as claimed.
Some final remarks
Theorem 1 can be generalized to
Both the interpolation and elementary proofs can be modified easily to cover these cases.
Now, the inequality \eqref{eq:operator} and the condition \eqref{eq:cond} can be written in dual/bilinear formulation:
In this formulation, the result can be proven even more simply than the arguments given before.
First, by doing a rescaling in the $y$ variables, we can assume that \[ \|k\|_{L^s_x L^r_y} = \|k\|_{L^s_y L^r_x} = \|k \|_{\cap} \] where the $\cap$ norm denotes the intersection norm (max) of the two mixed norms. Without loss of generality we can further assume that $\|f\|_{L^p_y} = 1 = \|g\|_{L^q_x}$.
Therefore it suffices to show that in this setting, the product $f(y) g(x)$ belongs to the dual space of the intersection space, or (sufficiently) that relative to the sum space norm \begin{equation}\label{eq:desiredsum} \| f(y) g(x) \|_{L^{s'}_x L^{r'}_y + L^{s'}_y L^{r'}_x} \leq 1. \end{equation}
Now observe that by assumption \begin{equation} 1 = (1-\theta) \frac{p}{s'} + \theta \frac{p}{r'} = (1-\theta) \frac{q}{r'} + \theta \frac{q}{s'} \end{equation} So we can write \[ f(x) g(y) = \left[ f(x)^{\frac{p}{s'}} g(y)^{\frac{q}{r'}} \right]^{1-\theta} \left[ f(x)^{\frac{p}{r'}} g(y)^{\frac{q}{s'}} \right]^{\theta} \] By Young's inequality for products, this means \begin{equation} f(x) g(y) \leq (1-\theta) \underbrace{\left[ f(x)^{\frac{p}{s'}} g(y)^{\frac{q}{r'}} \right]}_{\in L^{s'}_x L^{r'}_y} + \theta \underbrace{\left[ f(x)^{\frac{p}{r'}} g(y)^{\frac{q}{s'}} \right]}_{\in L^{s'}_y L^{r'}_{x}}. \end{equation} and the desired bound \eqref{eq:desiredsum} follows.
In some ways, this last proof is achieved by "doing real interpolation (as opposed to the complex interpolation of Riesz-Thorin) by hand."
The form of \eqref{eq:bilinear} leads to the natural question: does $k$ in fact live within the space $L^{p'}_y L^{q'}_x$? The answer is easily seen to be no.