Home Math A generalized Cauchy-Schwarz inequality by way of the Gibbs variational method

### A generalized Cauchy-Schwarz inequality by way of the Gibbs variational method

Let ${S}$ be a non-empty finite set. If ${X}$ is a random variable taking values in ${S}$, the Shannon entropy ${H[X]}$ of ${X}$ is outlined as

$displaystyle H[X] = -sum_{s in S} {bf P}[X = s] log {bf P}[X = s].$

There’s a good variational method that lets one compute logs of sums of exponentials when it comes to this entropy:

Lemma 1 (Gibbs variational method) Let ${f: S rightarrow {bf R}}$ be a perform. Then

$displaystyle log sum_{s in S} exp(f(s)) = sup_X {bf E} f(X) + {bf H}[X]. (1)$

Proof: Be aware that shifting ${f}$ by a continuing impacts each side of (1) the identical means, so we could normalize ${sum_{s in S} exp(f(s)) = 1}$. Then ${exp(f(s))}$ is now the likelihood distribution of some random variable ${Y}$, and the inequality will be rewritten as

$displaystyle 0 = sup_X sum_{s in S} {bf P}[X = s] log {bf P}[Y = s] -sum_{s in S} {bf P}[X = s] log {bf P}[X = s].$

However that is exactly the Gibbs inequality. (The expression contained in the supremum may also be written as ${-D_{KL}(X||Y)}$, the place ${D_{KL}}$ denotes Kullback-Liebler divergence. One may also interpret this inequality as a particular case of the Fenchel–Younger inequality relating the conjugate convex capabilities ${x mapsto e^x}$ and ${y mapsto y log y - y}$.) $Box$

On this word I want to use this variational method (which is also referred to as the Donsker-Varadhan variational method) to offer one other proof of the next inequality of Carbery.

Theorem 2 (Generalized Cauchy-Schwarz inequality) Let ${n geq 0}$, let ${S, T_1,dots,T_n}$ be finite non-empty units, and let ${pi_i: S rightarrow T_i}$ be capabilities for every ${i=1,dots,n}$. Let ${K: S rightarrow {bf R}^+}$ and ${f_i: T_i rightarrow {bf R}^+}$ be optimistic capabilities for every ${i=1,dots,n}$. Then

$displaystyle sum_{s in S} K(s) prod_{i=1}^n f_i(pi_i(s)) leq Q prod_{i=1}^n (sum_{t_i in T_i} f_i(t_i)^{n+1})^{1/(n+1)}$

the place ${Q}$ is the amount

$displaystyle Q := (sum_{(s_0,dots,s_n) in Omega_n} K(s_0) dots K(s_n))^{1/(n+1)}$

the place ${Omega_n}$ is the set of all tuples ${(s_0,dots,s_n) in S^{n+1}}$ such that ${pi_i(s_{i-1}) = pi_i(s_i)}$ for ${i=1,dots,n}$.

Thus for example, the id is trivial for ${n=0}$. When ${n=1}$, the inequality reads

$displaystyle sum_{s in S} K(s) f_1(pi_1(s)) leq (sum_{s_0,s_1 in S: pi_1(s_0)=pi_1(s_1)} K(s_0) K(s_1))^{1/2}$

$displaystyle ( sum_{t_1 in T_1} f_1(t_1)^2)^{1/2},$

which is well confirmed by Cauchy-Schwarz, whereas for ${n=2}$ the inequality reads

$displaystyle sum_{s in S} K(s) f_1(pi_1(s)) f_2(pi_2(s))$

$displaystyle leq (sum_{s_0,s_1, s_2 in S: pi_1(s_0)=pi_1(s_1); pi_2(s_1)=pi_2(s_2)} K(s_0) K(s_1) K(s_2))^{1/3}$

$displaystyle (sum_{t_1 in T_1} f_1(t_1)^3)^{1/3} (sum_{t_2 in T_2} f_2(t_2)^3)^{1/3}$

which may also be confirmed by elementary means. Nevertheless even for ${n=3}$, the prevailing proofs require the “tensor energy trick” with a view to scale back to the case when the ${f_i}$ are step capabilities (during which case the inequality will be confirmed elementarily, as mentioned within the above paper of Carbery).

We now show this inequality. We write ${K(s) = exp(k(s))}$ and ${f_i(t_i) = exp(g_i(t_i))}$ for some capabilities ${k: S rightarrow {bf R}}$ and ${g_i: T_i rightarrow {bf R}}$. If we take logarithms within the inequality to be confirmed and apply Lemma 1, the inequality turns into

$displaystyle sup_X {bf E} k(X) + sum_{i=1}^n g_i(pi_i(X)) + {bf H}[X]$

$displaystyle leq frac{1}{n+1} sup_{(X_0,dots,X_n)} {bf E} k(X_0)+dots+k(X_n) + {bf H}[X_0,dots,X_n]$

$displaystyle + frac{1}{n+1} sum_{i=1}^n sup_{Y_i} (n+1) {bf E} g_i(Y_i) + {bf H}[Y_i]$

the place ${X}$ ranges over random variables taking values in ${S}$, ${X_0,dots,X_n}$ vary over tuples of random variables taking values in ${Omega_n}$, and ${Y_i}$ vary over random variables taking values in ${T_i}$. Evaluating the suprema, the declare now reduces to

Lemma 3 (Conditional expectation computation) Let ${X}$ be an ${S}$-valued random variable. Then there exists a ${Omega_n}$-valued random variable ${(X_0,dots,X_n)}$, the place every ${X_i}$ has the identical distribution as ${X}$, and

$displaystyle {bf H}[X_0,dots,X_n] = (n+1) {bf H}[X]$

$displaystyle - {bf H}[pi_1(X)] - dots - {bf H}[pi_n(X)].$

Proof: We induct on ${n}$. When ${n=0}$ we simply take ${X_0 = X}$. Now suppose that ${n geq 1}$, and the declare has already been confirmed for ${n-1}$, thus one has already obtained a tuple ${(X_0,dots,X_{n-1}) in Omega_{n-1}}$ with every ${X_0,dots,X_{n-1}}$ having the identical distribution as ${X}$, and

$displaystyle {bf H}[X_0,dots,X_{n-1}] = n {bf H}[X] - {bf H}[pi_1(X)] - dots - {bf H}[pi_{n-1}(X)].$

By speculation, ${pi_n(X_{n-1})}$ has the identical distribution as ${pi_n(X)}$. For every worth ${t_n}$ attained by ${pi_n(X)}$, we are able to take conditionally unbiased copies of ${(X_0,dots,X_{n-1})}$ and ${X}$ conditioned to the occasions ${pi_n(X_{n-1}) = t_n}$ and ${pi_n(X) = t_n}$ respectively, after which concatenate them to kind a tuple ${(X_0,dots,X_n)}$ in ${Omega_n}$, with ${X_n}$ an additional copy of ${X}$ that’s conditionally unbiased of ${(X_0,dots,X_{n-1})}$ relative to ${pi_n(X_{n-1}) = pi_n(X)}$. One can the use the entropy chain rule to compute

$displaystyle {bf H}[X_0,dots,X_n] = {bf H}[pi_n(X_n)] + {bf H}[X_0,dots,X_n| pi_n(X_n)]$

$displaystyle = {bf H}[pi_n(X_n)] + {bf H}[X_0,dots,X_{n-1}| pi_n(X_n)] + {bf H}[X_n| pi_n(X_n)]$

$displaystyle = {bf H}[pi_n(X)] + {bf H}[X_0,dots,X_{n-1}| pi_n(X_{n-1})] + {bf H}[X_n| pi_n(X_n)]$

$displaystyle = {bf H}[pi_n(X)] + ({bf H}[X_0,dots,X_{n-1}] - {bf H}[pi_n(X_{n-1})])$

$displaystyle + ({bf H}[X_n] - {bf H}[pi_n(X_n)])$

$displaystyle ={bf H}[X_0,dots,X_{n-1}] + {bf H}[X_n] - {bf H}[pi_n(X_n)]$

and the declare now follows from the induction speculation. $Box$

With somewhat extra effort, one can exchange ${S}$ by a extra basic measure house (and use differential entropy rather than Shannon entropy), to recuperate Carbery’s inequality in full generality; we depart the main points to the reader.