1. Introduction
Let (X,
) be a measurable space satisfying
and μ bea σ-finite measure on (X,
). Let
be the set of all probability measures on (X,
) which are absolutely continuous with respect to μ. For P, Q ∈
, let
and
denote the Radon-Nikodym derivatives of P and Q with respect to μ.
Two probability measures
P, Q ∈
are said to be orthogonal and we denote this by
Q ⊥
P if:
Let f : [0, ∞) → (−∞, ∞] be a convex function that is continuous at zero, i.e.,
.
In 1963, I. Csiszár [
1] introduced the concept of
f-divergence as follows.
Definition 1. Let P, Q ∈. Then:is called the f-divergence of the probability distributions Q and P.
Remark 1. Observe that, the integrand in the formula (1) is undefined when p (
x) = 0.
The way to overcome this problem is to postulate for f as above that: For
f continuous convex on [0, ∞) we obtain the *-conjugate function of
f by:
and:
It is also known that if
f is continuous convex on [0, ∞), then so is
f*.
The following two theorems contain the most basic properties of
f-divergences. For their proofs we refer the reader to Chapter 1 of [
2] (see also [
3]).
Theorem 1 (Uniqueness and symmetry theorem).
Let f, f1 be continuous convex on [0, ∞).
We have:for all P,Q ∈ if and only if there exists a constant c ∈ ℝ,
such that:for any u ∈ [0, ∞).
Theorem 2 (Range of values theorem). Let f : [0, ∞) → ℝ
be a continuous convex function on [0, ∞).
For any P,Q ∈
,
we have the double inequality: If P = Q, then the equality holds in the first part of (3). If f is strictly convex at one, then the equality holds in the first part of (3) if and only if P = Q; If Q ⊥
P, then the equality holds in the second part of (3). If f (0) +
f* (0) < ∞,
then equality holds in the second part of (3) if and only if Q ⊥
P.
The following result is a refinement of the second inequality in Theorem 2 (see Theorem 3 in [
3]).
A function f defined on [0, ∞) is called normalised if f (1) = 0.
Theorem 3. Let f be a continuous convex and normalised function on [0, ∞) with f (0) + f* (0) < ∞.
For other inequalities for
f-divergence see [
4–
15].
We now give some examples of
f-divergences that are well-known and often used in the literature (see also [
3]).
(1) The class of χ
α-divergences. The f-divergences of this class, which is generated by the function χ
α, α ∈ [1, ∞), defined by:
have the form:
From this class only the parameter
α = 1 provides a distance in the topological sense, namely the total variation distance
. The most prominent special case of this class is, however, Karl Pearson’s χ
2-divergence:
that is obtained for
α = 2.
(2) Dichotomy class. From this class, generated by the function f
α : [0, ∞) → ℝ:
only the parameter
provides a distance, namely, the Hellinger distance:
Another important divergence is the Kullback–Leibler divergence obtained for
α = 1,
(3) Matsushita’s divergences. The elements of this class, which is generated by the function φ
α, α ∈ (0,1] given by:
are prototypes of metric divergences, providing the distances [
Iφa (
Q, P)]
a.
(4) Puri-Vincze divergences. This class is generated by the functions Φ
α, α ∈ [1, ∞) given by:
It has been shown in [
16] that this class provides the distances
.
(5) Divergences of Arimoto-type. This class is generated by the functions:
It has been shown in [
17] that this class provides the distances
for
α ∈ (0, ∞) and
for
α = ∞.
In order to introduce a quantum f-divergence for trace class operators in Hilbert spaces and study its properties we need some preliminary facts as follows.
2. Trace of Operators
Let
be a complex Hilbert space and {
ei}
i∈I an orthonormal basis of
H. We say that
A ∈ (
H) is a Hilbert–Schmidt operator if:
It is well know that, if {
ei}
i∈I and {
fj}
j∈J are orthonormal bases for
H and
A ∈ (
H), then:
showing that the definition (
6) is independent of the orthonormal basis and
A is a Hilbert–Schmidt operator iff
A* is a Hilbert–Schmidt operator.
Let
2 (
H) the set of Hilbert–Schmidt operators in
(
H). For
A ∈ 2 (
H) we define:
for {
ei}
i∈I an orthonormal basis of
H. This definition does not depend on the choice of the orthonormal basis.
Using the triangle inequality in l2 (I), one checks that 2 (H) is a vector space and that ‖·‖2 is a norm on 2 (H), which is usually called in the literature as the Hilbert-Schmidt norm.
Denote the modulus of an operator A ∈ (H) by |A| := (A*A)1/2.
Because
for all
x ∈
H,
A is Hilbert–Schmidt iff |
A| is Hilbert–Schmidt and
. From (
7) we have that if
A ∈ 2 (
H), then
A* ∈ 2 (
H) and ||
A||
2 = ||A*||
2.
The following theorem collects some of the most important properties of Hilbert–Schmidt operators:
Theorem 4.
We have:and the definition does not depend on the choice of the orthonormal basis {
ei}
i∈I; for any A E
2 (H) and:
for any A E 2 (H) and T ∈ (H);fin (H), the space of operators of finite rank, is a dense subspace of 2 (H);
, where
denotes the algebra of compact operators on H.
If {
ei}
i∈I an orthonormal basis of
H, we say that
A ∈ (
H) is trace class if:
The definition of ||
A||
1 does not depend on the choice of the orthonormal basis {
ei}
i∈I. We denote by
(
H) the set of trace class operators in
(
H).
The following proposition holds:
Proposition 1. If A E
(
H),
then the following are equivalent: The following properties are also well known:
Theorem 5. With the above notations:for any A ∈
(
H);
where K (H)* is the dual space of K (H) and (
H)*
is the dual space of (
H).
We define the trace of a trace class operator
A ∈
(
H) to be:
where {
ei}
i∈I an orthonormal basis of
H. Note that this coincides with the usual definition of the trace if
H is finite-dimensional. We observe that the series (
14) converges absolutely and it is independent from the choice of basis.
The following result collects some properties of the trace:
Theorem 6.
We have:tr (·) is a bounded linear functional on (H) with ||tr|| = 1;
If A, B ∈
(H), then AB, BA ∈
(H) and tr (AB) = tr (BA);
fin (H) is a dense subspace of (H).
Utilising the trace notation, we obviously have that:
for any
A, B ∈
(
H).
The following Hölder’s type inequality has been obtained by Ruskai in [
18]:
where
α ∈ (0,1) and
A, B ∈
(H) with
In particular, for
we get the Schwarz inequality:
with A,
B ∈
(H).
If A ≥ 0 and
P ∈
(H) with
P ≥ 0, then:
Indeed, since
A ≥ 0, then (
Ax,
x) ≥ 0 for any
x ∈
H. If {
ei}
i∈I an orthonormal basis of
H, then:
for any
i ∈ I. Summing over
i ∈ I, we get:
and since:
we obtain the desired result (
19).
This obviously imply the fact that, if
A and
B are self-adjoint operators with
A ≤ B and
P ∈
(
H) with
P ≥ 0, then:
Now, if
A is a self-adjoint operator, then we know that:
This inequality follows by Jensen’s inequality for the convex function
f (t) = |t| defined on a closed interval containing the spectrum of
A.
If {
ei}
i∈I is an orthonormal basis of
H, then:
for any
A a self-adjoint operator and
with
P ≥ 0}.
For the theory of trace functionals and their applications the reader is referred to [
19].
For some classical trace inequalities see [
20–
4], which are continuations of the work of Bellman [
25]. For related works the reader can refer to [
20,
24,
26–
2].
3. Classical Quantum f-Divergence
On complex Hilbert space
, where the Hilbert–chmidt inner product is defined by:
for
A,B ∈
(
H) consider the operators
LA:
(
H) →
(
H) and
defined by:
We observe that they are well defined and since:
and:
for any
T ∈
(
H), they are also positive in the operator order of
(H)), the Banach algebra of all bounded operators on
(H) with the norm ||·||
2 where ||
T||
2 = tr
(|T|
2), T
∈ (H).
Since tr (|
X*|
2) = tr (|
X|
2) for any
, then also:
for
A ≥ 0 and
.
We observe that
and
are commutative, therefore the product
is a self-adjoint positive operator on
for any positive operators
.
For
with
B invertible, we define the Araki transform
by
. We observe that for
, and we have
and:
Observe also, by the properties of trace, that:
giving that:
for any
.
Let U be a self-adjoint linear operator on a complex Hilbert space (K; ⟨·, ·,⟩). The Gelfand map establishes a ∗-isometrically isomorphism Φ between the set C (Sp (U)) of all continuous functions defined on the spectrum of U, denoted Sp (U), and the C∗-algebra C∗ (U) generated by U and the identity operator 1K on K as follows:
For any
f, g ∈
C (Sp (
U)) and any
α, β ∈ ℂ we have
Φ (αf + βg) = αΦ (f) + βΦ (g);
Φ (fg) = Φ (f) Φ (g) and
;
‖Φ (f)‖ = ‖f‖ := supt∈Sp(U) |f (t)|;
Φ (f0) = 1K and Φ (f1) = U, where f0 (t) = 1 and f1 (t) = t, for t ∈ Sp (U).
With this notation we define:
and we call it the continuous functional calculus for a self-adjoint operator
U.
If
U is a self-adjoint operator and
f is a real valued continuous function on Sp (
U), then
f (
t)
≥ 0 for any
t ∈ Sp (
U) implies that
f (
U)
≥ 0,
i.e.,
f (
U) is a positive operator on
K. Moreover, if both
f and
g are real valued functions on Sp (
U), then the following important property holds:
in the operator order of
B (
K).
Let
f : [0
, ∞)
→ ℝ be a continuous function. Utilising the continuous functional calculus for the Araki self-adjoint operator
we can define the quantum
f-divergence for
Q, P ∈
S (
H) := {
P ∈
B1 (
H)
, P ≥ 0 with tr (
P) = 1} and
P invertible, by:
If we consider the continuous convex function
f : [0
, ∞)
→ R, with
f (0) := 0 and
f (
t) =
t ln
t for
t > 0, then for
Q, P ∈
S (
H) and
Q, P invertible, we have:
which is the Umegaki relative entropy.
If we take the continuous convex function
f : [0
, ∞)
→ R,
f (
t) =
|t − 1
| for
t ≥ 0, then for
Q, P ∈
S (
H) with
P invertible, we have:
where
V (
Q, P ) is the variational distance.
If we take
f : [0
, ∞)
→ R,
f (
t) =
t2 − 1 for
t ≥ 0, then for
Q, P ∈
S (
H) with
P invertible, we have:
which is called the
χ2-distance.
Let
q ∈ (0, 1) and define the convex function
fq : [0
, ∞)
→ R by
. Then:
which is Tsallis relative entropy.
If we consider the convex function
f : [0
, ∞)
→ R by
, then:
which is known as Hellinger discrimination.
If we take
f : (0
, ∞)
→ R,
f (
t) =
− ln
t, then for
Q, P ∈
S (
H) and
Q, P invertible, we have:
In the important case of finite dimensional space
H and the generalized inverse
P−1, numerous properties of the quantum
f-divergence have been obtained in the recent papers [
33–
36] and the references therein. We omit the details.
4. A New Quantum f-Divergence
In order to simplify the writing, we denote by S1 (H) the set of all density operators which are elements of
having unit trace.
We observe that, if P, Q are self-adjoint with P, Q ≥ 0 and P is invertible, then
.
Let
f : [0
, ∞)
→ ℝ be a continuous convex function on [0
, ∞). We can define the following new quantum
f-divergence functional:
for
Q, P ∈
S1 (
H) with
P invertible. The definition can be extended for any continuous function.
If we take the convex function
f (
t) =
t2 − 1,
t ≥ 0, then we get:
for
Q, P ∈
S1 (
H) with
P invertible, which is the Karl Pearson’s
χ2-divergence version for trace class operators. This divergence is the same as the one generated by the classical
f-divergence, see (
23).
More general, if we take the convex function
f (
t) =
tn −1,
t ≥ 0 and
n a natural number with
n ≥ 2, then we get:
for
Q, P ∈
S1 (
H) with
P invertible.
If we take the convex function
f (
t) =
t ln
t for
t > 0 and
f (0) := 0, then we get:
for
Q, P ∈
S1 (
H) with
P and
Q invertible. We observe that this is not the same as Umegaki relative entropy introduced above.
If we take the convex function
f (
t) =
− ln
t for
t > 0, then we get:
for
Q, P ∈
S1 (
H) with
P and
Q invertible.
If we take the convex function
f (
t) =
|t − 1
| , t ≥ 0, then we get:
for
Q, P ∈
S1 (
H) with
P invertible.
If we consider the convex function
, then:
for
Q, P ∈ S1 (
H) with
P and
Q invertible.
If we take the convex function
, then we get:
which is different, in general, from the Tsallis relative entropy introduced above.
Other examples may be considered by taking the convex functions from the introduction. The details are omitted.
Suppose that I is an interval of real numbers with interior
and f : I → ℝ is a convex function on I. Then, f is continuous on
and has finite left and right derivatives at each point of
. Moreover, if
and x < y, then
which shows that both
and
are nondecreasing function on
. It is also known that a convex function must be differentiable except for at most countably many points.
For a convex function
f :
I → ∈, the subdifferential of
f denoted by
∂f is the set of all functions
φ : I → [−∞, ∞], such that
and:
It is also well known that if
f is convex on
I, then
∂f is nonempty,
and if
φ ∈ ∂f, then:
In particular,
φ is a nondecreasing function.
If f is differentiable and convex on
, then ∂f = {f′}.
Theorem 7.
Let f be a continuous convex function on [0, ∞)
with f (1) = 0.
Then, we have:for any Q, P ∈ S1 (
H)
with P invertible.
If f is continuously differentiable on (0, ∞),
then we also have: Proof. For any
x ≥ 0, we have from the gradient inequality (
24) that:
and since f is normalised, then:
Utilising the property (
P) for the positive operator
where
Q,P ∈
S1 (
H) with
P invertible, then we have the inequality in the operator order:
Utilising the property (
20) for the inequality (
28), we have:
and the inequality (
25) is proven.
From the gradient inequality, we also have for any
x ≥ 0:
and since
f is normalised, then:
which, as above, implies that:
Making use of the property (
20) for the inequality (
29), then we get:
which is the required inequality (
26).
Remark 2. If we take f (
t) = −ln
t, t > 0
in Theorem 7, then we get:for any Q, P ∈ S1 (
H)
with P and Q invertible.
If we take the convex function ε (
t) = e
t−1 − 1,
then:where Q, P ∈ S1 (
H)
with P invertible.
By Theorem 7, we get:where Q, P ∈ S1 (
H)
with P invertible.
The inequality in (32) is equivalent to:where Q, P ∈ S1 (
H)
with P invertible.
The following lemma is of interest in itself:
Lemma 1. Let S be a self-adjoint operator such that γ1
H ≤ S ≤ Γ1
H for some real constants Γ
≥ γ. Then, for any, we have: Proof. Observe that:
since, obviously:
Now, since
γ1
H ≤ S ≤ Γ1
H, then:
Taking the modulus in (
34) and using the properties of trace, we have:
which proves the first part of (
33).
By Schwarz inequality for trace, we also have:
From (
35) and (
36), we get:
which implies that:
By (
36), we then obtain:
that proves the last part of (
33). ❚
Corollary 1. Let Q, P ∈
S1 (
H)
with P invertible and such that there exists 0
< r ≤ 1
≤ R satisfying the condition (38). Then: Proof. Utilising the inequality (
33) for
we have:
and the inequality (
37) is proved. ❚
We observe that if
Q, P ∈
S1 (
H) with
P invertible and there exists
r, R > 0 with:
then by the property (
20), we get:
showing that
r ≤ 1
≤ R.
The following result provides a simple upper bound for the quantum f-divergence Df (Q, P).
Theorem 8. Let f be a continuous convex function on [0
, ∞)
with f (1) = 0.
Then, we have:for any Q, P ∈ S1 (
H)
with P invertible and satisfying the condition (38).
Proof. Without loosing the generality, we prove the inequality in the case when f is continuously differentiable on (0, ∞).
We have:
for any
λ ∈ ℝ and for any
Q, P ∈ S1 (
H) with
P invertible.
Since
f′ is monotonic nondecreasing on [
r, R], then:
This implies in the operator order that:
therefore:
From (
30) and (
40), we have:
which proves the first inequality in (
39).
The rest follows by (
37). ❚
Example 1. 1) If we take f (
t) =
− ln
t, t > 0
in Theorem 8, then we get:for any Q, P ∈ S1 (
H)
with P, Q invertible and satisfying the condition:with r > 0.
2) With the same conditions as in 1) for Q, P and if we take f (
t) =
t ln
t, t > 0
in Theorem 8, then we get: 3) If we take in (39), then we get:provided that Q, P ∈ S1 (
H)
, with P, Q invertible and satisfying the condition (43).
We have the following upper bound, as well:
Theorem 9.
Let f : [0
, ∞)
→ ℝ
be a continuous convex function that is normalized. If Q, P ∈ S1 (
H)
, with P invertible, and there exists R ≥ 1
≥ r ≥ 0
such that the condition (38) is satisfied, then:Proof. By the convexity of
f, we have:
for any
t ∈ [
r, R].
This inequality implies the following inequality in the operator order of
B (
H):
for
Q, P ∈
S1 (
H), with
P invertible, and
R ≥ 1
≥ r ≥ 0 such that the condition (
38) is satisfied.
Utilising the property (
20), we get from (
47) that:
and the inequality (
46) is thus proven. ❚
Remark 3. If we take in (46) f (
t) =
t2 − 1
, then we get:for Q, P ∈
S1 (
H)
, with P invertible and satisfying the condition (38).
If we take in (46) f (
t) =
t ln
t, then we get the inequality:provided that Q, P ∈
S1 (
H)
, with P, Q invertible and satisfying the condition (38).
With the same assumptions for P, Q, if we take in (46) f (
t) =
− ln
t, then we get the inequality: