Let us consider first the linear wave equation
\[ \Box u = 0\]
on (1+d)-dimensional Minkowski space. A well known property of the wave equation is the *conservation of energy*, which can be written as
\[\frac{d}{dt}E_0(t) = 0 \]
where
\[ E(s) = \int_{\mathbb{R}^d} |\partial_t u(s,x)|^2 + |\nabla u(s,x)|^2 dx \]
and proven by simply multiplying the wave equation by $\partial_t u$ and integrating-by-parts along a spatial slice.

Now, using the fact that partial derivatives commute, we also have that all derivatives of $u$ solve wave equations \ \Box \partial^\alpha u = 0 \ and hence has an associated energy conservation $\frac{d}{dt}E[\partial^\alpha u] = 0$. This, in particular, implies that for any $k\geq 1$, the quantity \begin{equation} F_k[u] := \int_{\mathbb{R}^d} \sum_{1\leq |\alpha|\leq k} |\partial^\alpha u|^2 dx \end{equation} evaluated at some time $t > 0$ is bounded by the same quantity evaluated at time $t = 0$.

Now let us take a slightly more geometric point of view: the fact that "partial derivatives commute against the d'Alembertian operator" is, in fact, a manifestation of the fact that translations are symmetries of Minkowski space. More precisely, consider an arbitrary pseudo-Riemannian manifold $(M,g)$, and let $\Box_g$ denote its associated Laplace-Beltrami operator, which in coordinates can be written as $\frac{1}{\sqrt{|g|}} \partial_i g^{ij}\sqrt{|g|} \partial_j$, we see that if $X$ is a Killing vector field, that is, the Lie derivative $L_X g = 0$, then $[\Box_g,X] = 0$. In other words, we have that for an arbitrary thrice differentiable function $\phi$ on $M$, \begin{equation} X(\Box_g \phi) = \Box_g(X\phi) \qquad \text{whenever } X \text{ is Killing.} \end{equation} Going back to Minkowski space, we see that if $X$ is a Killing vector field of the Minkowski metric (for example, translation, rotation, Lorentzian boosts), then for $u$ a solution to the wave equation, so will $X(u)$ be a solution to the wave equation. And in particular, the computation for energies gives that $E[X(u)]$ is conserved.

What if we are in a variable coefficient case where the spatial translations are not Killing vector fields? Going back to the general pseudo-Riemannian case, we have that, after a computation \[ X(\Box_g \psi) = \Box_g(X\psi) - Ric(X,\nabla \psi) - L_Xg \cdot \nabla^2 \psi - (\Box_gX) \cdot \nabla \psi\] which means that $Xu$, for $u$ a solution to the wave equation, now solves an inhomogeneous wave equation, and we no longer have a conservation law. In fact, the best estimate we can get in this situation is from Gronwall's inequality applied to the energy identity, that the energy $E[X(u)]$ grows possibly exponentially.

Now suppose we are dealing with the equation in divergence form \[ \partial^2_{tt} u - \partial_i (a^{ij} \partial_j u) = 0 \] with $a^{ij} = a^{ij}(x)$ independent of $t$. Since $\partial_i$ are no longer Killing with respect to the implicit metric, at first sight, it seems that we lose control over the higher order energies. (The lowest order energy can still be controlled in the same way: with the assumption that the coefficients $a^{ij}$ is time-independent, the integration by parts carry through and we have that \[ E^{(a)}[u] := \int_{\mathbb{R}^d} |\partial_t u(t,x)|^2 + a^{ij}(x)\partial_i u(t,x) \partial_ju(t,x) dx\] is conserved in time.) But in this particular case we can actually take advantage of elliptic regularity and get controls for all higher derivatives. This trick is used a few times in the work of Dafermos-Rodnianski on decay of waves on black hole backgrounds.

What we do is first, notice that the equation is invariant under $\partial_t$, so we have \[ \partial^2_{tt}( (\partial_t)^ku) = \partial_i(a^{ij}\partial_j (\partial_t)^k u) \] which implies that $E^{(a)}[(\partial_t)^k u]$ is a conserved energy. The elliptic regularity we will exploit is the assumption that $a^{ij}$ is sufficiently smooth, and uniformly elliptic, that is

Now consider the following $L^2$ integral: \[\int_{\mathbb{R}^d} |\partial_i\partial_jv|^2 dx \] for some arbitrary function $v$ (which doesn't have to be a solution to the wave equation). Using elliptic regularity, the integrand is controlled by \[ |\partial^2v|^2 \leq \lambda^{-2} a^{ik}a^{jl}\partial^2_{ij}v \partial^2_{kl}v \] which we "anti-derive by parts" \[ |\partial^2v|^2 \leq \lambda^{-2} \left[ \partial_j(a^{ik}\partial_iv) \partial_k(a^{jl}\partial_lv) - (\partial_ja^{ik})(\partial_i v( a^{jl}\partial^2_{kl}v - (\partial_ka^{jl})(\partial_lv) a^{ik}\partial^2_{ij}v \right]. \] We integrate both sides. The first term on the right hand side we perform a double integration by parts, the second and third terms on the right hand side we use the elliptic regularity on $a^{ij}$, and just cast the derivative of the coefficient matrix in sup norm. So we have \begin{equation}\label{eq:inter1} \int_{\mathbb{R}^d} |\partial_i\partial_jv|^2 dx \leq \frac{1}{\lambda^2} \int_{\mathbb{R}^d} \left[ \partial_j(a^{ij}\partial_i v)\right]^2 dx + \frac{C_1\Lambda}{\lambda^2} \int_{\mathbb{R}^d} |\partial v| |\partial^2 v| dx. \end{equation} Applying Cauchy-Schwarz (Young's weighted inequality) to the last term gives \begin{equation} \frac{C_1\Lambda}{\lambda^2} \int_{\mathbb{R}^d} |\partial v| |\partial^2 v| dx \leq \frac{C_1^2\Lambda^2}{2\lambda^4} \int_{\mathbb{R}^d} |\partial v|^2 dx + \frac12 \int_{\mathbb{R}^d} |\partial^2 v| dx \end{equation} from which we can conclude, by absorbing the last term to the left hand side of $\eqref{eq:inter1}$, that \begin{equation} \int_{\mathbb{R}^d} |\partial_i\partial_jv|^2 dx \leq \frac{2}{\lambda^2} \int_{\mathbb{R}^d} \left[ \partial_j(a^{ij}\partial_i v)\right]^2 dx + \frac{C_1^2\Lambda^2}{\lambda^4} \int_{\mathbb{R}^d} |\partial v|^2 dx. \end{equation}

Now examine the first term in the integrand on the right hand side. Suppose $v = \partial^\gamma (\partial_t)^m u$ for some solution of the wave equation. Commuting all the way in, we have \[ |\partial_j(a^{ij}\partial_i v) | \leq |\partial^\gamma (\partial_t)^m\partial_j(a^{ij}\partial_i u)| + \sum_{\alpha + \beta = \gamma, |\alpha| > 0} |\partial_j(\partial^\alpha a^{ij} \partial_i\partial^\beta (\partial_t)^mu)| .\] In the first term on the right hand side, we can apply the equation. So combining everything we have that \begin{equation} \sum_{|\gamma| = k}\int_{\mathbb{R}^d} |\partial^\gamma (\partial_t)^m u|^2 dx \leq \frac{2}{\lambda^2} \sum_{|\gamma| = k-2} \int_{\mathbb{R}^d}|\partial^\gamma (\partial_t)^{m+2}u|^2 dx + \frac{4\Lambda^2}{\lambda^4}\sum_{0<|\gamma| < k} \int_{\mathbb{R}^d}(C_{k - |\gamma|})^2 |\partial^\gamma (\partial_t)^m u|^2 dx. \end{equation} Now defining $H_{k,m}(t) = \sum_{|\gamma| = k}\int_{\mathbb{R}^d}|\partial^\gamma (\partial_t)^m u|^2 dx$, the above is a recursion relation \begin{equation} H_{k,m}(t) \leq \frac{2}{\lambda^2} H_{k-2,m+2}(t) + \frac{4\Lambda^2}{\lambda^4} \sum_{\ell = 1}^{k-1} (C_{k-\ell})^2 H_{\ell,m}(t). \end{equation} By iterating the recursion relation, we have that for some large constant $\tilde{C}$ depending on $k,m, C_\star, \Lambda, \lambda$ such that \begin{equation} H_{k,m}(t) \leq \tilde{C} \sum_{\ell = m}^{m+k-1} (H_{0,\ell+1}(t) + H_{1,\ell}(t)). \end{equation}

The significance of the above expression is that, by definition, \[ H_{0,\ell+1} + \lambda H_{1,\ell} \leq E^{(a)}[(\partial_t)^\ell u] \leq H_{0,\ell+1}(t) + \Lambda H_{1,\ell}(t) \] and so, if we know that our data has initial data $u|_{t = 0} = u_0, \partial_t u|_{t=0} = u_1$ with $u_0 \in H^k$ and $u_1 \in H^{k-1}$, the above derivation shows that the quantity \[ \sum_{1 < \ell + m < k} H_{\ell,m}(t) \] will remain bounded for all times by a constant depending on $\tilde{C}$ above and the Sobolev norms of the initial data. And in particular, the "bad scenario" in which the higher order energy grows exponentially cannot happen.