Tensors: Geometry and Applications
Tensors: Geometry and Applications
Tensors: Geometry and Applications
J.M. Landsberg
Contents
Preface xi
0.1. Usage xi
0.2. Overview xii
0.3. Clash of cultures xvii
0.4. Further reading xviii
0.5. Conventions, acknowledgments xix
Chapter 1. Introduction 3
1.1. The complexity of matrix multiplication 5
1.2. Definitions from multilinear algebra 6
1.3. Tensor decomposition 11
1.4. P v. NP and algebraic variants 17
1.5. Algebraic Statistics and tensor networks 21
1.6. Geometry and representation theory 24
v
vi Contents
8.6. The Chow variety of zero cycles and its equations 221
8.7. The Fano variety of lines on a variety 225
Chapter 9. Rank 229
9.1. Remarks on rank for arbitrary varieties 229
9.2. Bounds on symmetric rank 231
9.3. Examples of classes of polynomials and their ranks 235
Chapter 10. Normal forms for small tensors 243
10.1. Vector spaces with a finite number of orbits 244
10.2. Vector spaces where the orbits can be explicitly parametrized 246
10.3. Points in C2 Cb Cc 249
10.4. Ranks and border ranks of elements of S 3 C3 255
10.5. Tensors in C3 C3 C3 257
10.6. Normal forms for C2 S 2 W 258
10.7. Exercises on normal forms for general points on small secant
varieties 259
10.8. Limits of secant planes 259
10.9. Limits for Veronese varieties 261
10.10. Ranks and normal forms in 3 (Seg(PA1 PAn )) 264
Part 3. Applications
Chapter 11. The complexity of matrix multiplication 271
11.1. Real world issues 272
11.2. Failure of the border rank version of Strassens conjecture 273
11.3. Finite group approach to upper bounds 277
11.4. R(M3,3,3 ) 23 279
5
11.5. Blasers 2 -Theorem 279
11.6. The Brockett-Dobkin Theorem 281
11.7. Multiplicative complexity 283
Chapter 12. Tensor decomposition 285
12.1. Cumulants 286
12.2. Blind deconvolution of DS-CMDA signals 289
12.3. Uniqueness results coming from algebraic geometry 295
12.4. Exact decomposition algorithms 297
12.5. Kruskals Theorem and its proof 301
Contents ix
Index 429
Preface
Tensors are ubiquitous in the sciences. One reason for their ubiquity is
that they provide a useful way to organize data. Geometry is a powerful
tool for extracting information from data sets, and a beautiful subject in its
own right. This book has three intended uses: as a classroom textbook, a
reference work for researchers, and a research manuscript.
0.1. Usage
Classroom uses. Here are several possible courses one could give from this
text:
(1) The first part of this text is suitable for an advanced course in multi-
linear algebra - it provides a solid foundation for the study of tensors
and contains numerous applications, exercises, and examples. Such
a course would cover Chapters 1,2,3 and parts of Chapters 4,5,6.
(2) For a graduate course on the geometry of tensors not assuming al-
gebraic geometry, one can cover Chapters 1,2,4,5,6,7 and 8 skipping
2.9-12, 4.6, 5.7, 6.7 (except Pieri), 7.6 and 8.6-8.
(3) For a graduate course on the geometry of tensors assuming alge-
braic geometry and with more emphasis on theory, one can follow
the above outline only skimming Chapters 2 and 4 (but perhaps
add 2.12) and add selected later topics.
(4) I have also given a one semester class on the complexity of ma-
trix multiplication using selected material from earlier chapters and
then focusing on Chapter 11.
xi
xii Preface
Research uses. I have tried to state all the results and definitions from
geometry and representation theory needed to study tensors. When proofs
are not included, references for them are given. The text includes the state
of the art regarding ranks and border ranks of tensors, and explains for
the first time many results and problems coming from outside mathematics
in geometric language. For example, a very short proof of the well-known
Kruskal theorem is presented, illustrating that it hinges upon a basic geomet-
ric fact about point sets in projective space. Many other natural subvarieties
of spaces of tensors are discussed in detail. Numerous open problems are
presented throughout the text.
Many of the topics covered in this book are currently very active areas
of research. However, there is no reasonable reference for all the wonderful
and useful mathematics that is already known. My goal has been to fill this
gap in the literature.
0.2. Overview
The book is divided into four parts: I. First applications, multilinear algebra,
and overview of results, II. Geometry and Representation theory, III. More
applications and IV. Advanced topics.
This chapter may be difficult for those unfamiliar with algebraic ge-
ometry - it is terse as numerous excellent references are available (e.g.
[154, 284]). Its purpose is primarily to establish language. Its prerequisite
is Chapter 2.
Chapter 5: Secant varieties. The notion of border rank for tensors has
a vast and beautiful generalization in the context of algebraic geometry, to
that of secant varieties of projective varieties. Many results on border rank
are more easily proved in this larger geometric context, and it is easier to
develop intuition regarding the border ranks of tensors when one examines
properties of secant varieties in general.
The prerequisite for this chapter is Chapter 4.
tangential varieties, dual varieties, and the Fano varieties of lines that gen-
eralize certain attributes of tensors to a more general geometric situation.
In the special cases of tensors, these varieties play a role in classifying nor-
mal forms and the study of rank. For example, dual varieties play a role in
distinguishing the different typical ranks that can occur for tensors over the
real numbers. They should also be useful for future applications. Chapter 8
discusses these as well as the Chow variety of polynomials that decompose
to a product of linear factors. I also present differential-geometric tools for
studying these varieties.
Chapter 8 can mostly be read immediately after Chapter 4.
Chapter 10: Normal forms for small tensors. The chapter describes
the spaces of tensors admitting normal forms, and the normal forms of ten-
sors in those spaces, as well as normal forms for points in small secant
varieties.
The chapter can be read on a basic level after reading Chapter 2, but
the proofs and geometric descriptions of the various orbit closures require
material from other chapters.
this and similar instances but no artificer makes the ideas themselves: how
could he?
And what of the maker of the bed? Were you not saying that he too
makes, not the idea which, according to our view, is the essence of the bed,
but only a particular bed?
Yes, I did. Then if he does not make that which exists he cannot make
true existence, but only some semblance of existence; and if any one were
to say that the work of the maker of the bed, or of any other workman, has
real existence, he could hardly be supposed to be speaking the truth.
This difference of cultures is particularly pronounced when discussing
tensors: for some practitioners these are just multi-way arrays that one is
allowed to perform certain manipulations on. For geometers these are spaces
equipped with certain group actions. To emphasize the geometric aspects
of tensors, geometers prefer to work invariantly: to paraphrase W. Fulton:
Dont use coordinates unless someone holds a pickle to your head1
The standard reference for what was known in algebraic complexity the-
ory up to 1997 is [53].
Motivation from
applications,
multilinear algebra and
elementary results
Chapter 1
Introduction
3
4 1. Introduction
data into a multi-way array and isolate essential features of the data by
decomposing the corresponding tensor into a sum of rank one tensors. Chap-
ter 12 discusses several examples of tensor decomposition arising in wireless
communication. In 1.3, I provide two examples of tensor decomposition:
flourescence spectroscopy in chemistry, and blind source separation. Blind
source separation (BSS) was proposed in 1982 as a way to study how, in
vertebrates, the brain detects motion from electrical signals sent by ten-
dons (see [95, p. 3]). Since then numerous researchers have applied BSS in
many fields, in particular engineers in signal processing. A key ingredient of
BSS comes from statistics, the cumulants defined in 12.1 and also discussed
briefly in 1.3. P. Comon utilized cumulants in [93], initiating independent
component analysis (ICA), which has led to an explosion of research in signal
processing.
that by carefully choosing signs the algorithm works over an arbitrary field.
We will see in 5.2.2 why the algorithm could have been anticipated using
elementary algebraic geometry.
Remark 1.1.2.3. In fact there is a nine parameter family of algorithms
for multiplying 2 2 matrices using just seven scalar multiplications. See
(2.4.5).
however the identical definitions hold for real vector spaces (just adjusting
the ground field where necessary). Let Cn denote the vector space of n-tuples
of complex numbers, i.e., if v Cn , write the vector v as v = (v1 , . . . , vn )
with vj C. The vector space structure of Cn means that for v, w Cn and
C, v + w = (v1 + w1 , . . . , vn + wn ) Cn and v = (v1 , . . . , vn ) Cn .
A map f : Cn Cm is linear if f (v + w) = f (v) + f (w) for all v, w Cn
and C. In this book vector spaces will generally be denoted by capital
letters A, B, C, V, W , with the convention dim A = a, dim B = b etc.. I will
generally reserve the notation Cn for an n-dimensional vector space equipped
with a basis as above. The reason for making this distinction is that the
geometry of many of the phenomena we will study is more transparent if
one does not make choices of bases.
If A is a vector space, let A := {f : A C | f is linear} denote
the dual vector space. If A and b B, one can define a linear map
b : A B by a 7 (a)b. Such a linear map has rank one. The rank of a
linear map f : A B is the smallest
Prr such that there exist 1 , . . . , r A
and b1 , . . . , br B such that f = i=1 i bi . (See Exercise 2.1.(4) for the
equivalence of this definition with other definitions of rank.)
If C2 and C3 are equipped with bases (e1 , e2 ), (f1 , f2 , f3 ) respectively,
and A : C2 C3 is a linear map given with respect to this basis by a matrix
1 1
a1 a2
a21 a22 ,
a31 a32
then A may be written as the tensor
A = a11 e1 f1 + a12 e2 f1 + a21 e2 f1 + a22 e2 f2 + a31 e1 f3 + a32 e2 f3
and there exists an expression A = 1 b1 + 2 b2 because A has rank at
most two.
Exercise
1.2.1.1:
Find such an expression explicitly when the matrix of A
1 2
is 1 0.
3 1
1.2.2. Bilinear maps. Matrix multiplication is an example of a bilinear
map, that is, a map f : A B C where A, B, C are vector spaces and
for each fixed element b B, f (, b) : A C is linear and similarly for each
fixed element of A. Matrix multiplication of square matrices is a bilinear
map:
2 2 2
(1.2.1) Mn,n,n : Cn Cn Cn .
If A , B and c C, the map c : A B C defined by
(a, b) 7 (a)(b)c is a bilinear map. For any bilinear map T : A B C,
8 1. Introduction
1.2.6. Border rank and symmetric border rank. Related to the no-
tions of rank and symmetric rank, and of equal importance for applications,
are that of border rank and symmetric border rank defined below, respectively
denoted R(T ), RS (P ). Here is an informal example to illustrate symmetric
border rank.
Example 1.2.6.1. While a general homogeneous polynomial of degree three
in two variables is a sum of two cubes, it is not true that every cubic poly-
nomial is either a cube or the sum of two cubes. For an example, consider
P = x3 + 3x2 y.
P is not the sum of two cubes. (To see this, write P = (sx + ty)3 + (ux +
vy)3 for some constants s, t, u, v, equate coefficients and show there is no
solution.) However, it is the limit as 0 of polynomials P that are sums
of two cubes, namely
1
P := (( 1)x3 + (x + y)3 ).
This example dates back at least to Terracini nearly 100 years ago. Its
geometry is discussed in Example 5.2.1.2.
Definition 1.2.6.2. The symmetric border rank of a homogeneous poly-
nomial P , RS (P ), is the smallest r such that there exists a sequence of
polynomials P , each of rank r, such that P is the limit of the P as tends
10 1. Introduction
1.2.7. Our first spaces of tensors and varieties inside them. Let
A B denote the vector space of linear maps A B. The set of linear
maps of rank at most r will be denoted r = r,A B . This set is the zero
set of a collection of homogeneous polynomials on the vector space A B.
Explicitly, if we choose bases and identify A B with the space of a b
matrices, r is the set of matrices whose (r + 1) (r + 1) minors are all zero.
In particular there is a simple test to see if a linear map has rank at most r.
A subset of a vector space defined as the zero set of a collection of
homogeneous polynomials is called an algebraic variety.
Now let A B C denote the vector space of bilinear maps AB C.
This is our first example of a space of tensors, defined in Chapter 2, beyond
the familiar space of linear maps. Expressed with respect to bases, a bilinear
map is a three dimensional matrix or array.
The set of bilinear maps of rank at most r is not an algebraic variety,
i.e., it is not the zero set of a collection of polynomials. However the set
of bilinear maps of border rank at most r is and algebraic variety. The
set of bilinear maps f : A B C of border rank at most r will be
denoted r = r,A B C . It is the zero set of a collection of homogeneous
polynomials on the vector space A B C.
1.3. Tensor decomposition 11
r
X
T af bf cf
f =1
P
where each f represents a substance. Writing af = ai,f ei , then ai,f is
the concentration of the f -th substance in the i-th sample, and similarly
using the given bases of RJ and RK , ck,f is the fraction of photons the f -th
substance emits at wavelength k, and bj,f is the intensity of the incident
light at excitation wavelength j multiplied by the absorption at wavelength
j.
There will be noise in the data, so T will actually be of generic rank, but
there will be a very low rank tensor T that closely approximates it. (For all
complex spaces of tensors, there is a rank that occurs with probability one
which is called the generic rank, see Definition 3.1.4.2.) There is no metric
naturally associated to the data, so the meaning of approximation is not
clear. In [1], one proceeds as follows to find r. First of all, r is assumed to
be very small (at most 7 in their exposition). Then for each r0 , 1 r0 7,
one assumes r0 = r and applies a numerical algorithm that attempts to
find the r0 components (i.e. rank one tensors) that T would be the sum
of. The values of r0 for which the algorithm does not converge quickly are
thrown out. (The authors remark that this procedure is not mathematically
justified, but seems to work well in practice. In the example, these discarded
values of r0 are too large.) Then, for the remaining values of r0 , one looks
at the resulting tensors to see if they are reasonable physically. This enables
1.3. Tensor decomposition 13
them to remove values of r0 that are too small. In the example, they are
left with r0 = 4, 5.
Now assume r has been determined. Since the value of r is relatively
small, up to trivialities, the expression of T as the sum of r rank one elements
will be unique, see 3.3. Thus, by performing the decomposition of T , one
recovers the concentration of each of the r substances in each solution by
determining the vectors af as well as the individual excitation and emission
spectra by determining the vectors bf .
the rank of this matrix will be r. (In practice, the matrix will be close to a
matrix of rank r.) The matrix 2 (x) is called a covariance matrix. One can
define higher order cumulants to obtain further measurements of statistical
independence. For example, consider
(1.3.1) ijk = mijk (mi mjk + mj mik + mk mij ) + 2mi mj mk .
We may form a third order symmetric tensor from these quantities, and
similarly for higher orders.
Cumulants of a set of random variables (i.e. functions on a space with
a probability measure) give an indication of their mutual statistical depen-
dence, and higher-order cumulants of a single random variable are some
measure of its non-Gaussianity.
Definition 1.3.2.2. In probability, two events A, B are independent if
P r(A B) = P r(A)P r(B), where P r(A) denotes the probability of the
event A. If x is a random variable, one can compute P r(x a). Two ran-
dom variables x, y are statistically independent if P r({x a} {y b}) =
P r({x a})P r({y b}) for all a, b R+ . The statistical independence of
random variables x1 , . . . , xm is defined similarly.
Example 1.3.3.1 (BSS was inspired by nature). How does our central
nervous system detect where a muscle is and how it is moving? The muscles
send electrical signals through two types of transmitters in the tendons,
called primary and secondary, as the first type sends stronger signals. There
are two things to be recovered, the function p(t) of angular position and
v(t) = dp
dt of angular speed. (These are to be measured at any given instant
so your central nervous system cant simply take a derivative.) One might
think one type of transmitter sends information about v(t) and the other
about p(t), but the opposite was observed, there is some kind of mixing: say
the signals sent are respectively given by functions f1 (t), f2 (t). Then it was
16 1. Introduction
multiplications. Thus for a 10 10 matrix one has 104 for Gaussian elimi-
nation applied navely versus 107 for (1.4.1) applied navely. This difference
in complexity is discussed in detail in Chapter 13.
The determinant of a matrix is unchanged by the following operations:
X 7 gXh
X 7 gX T h
111
000
000
111 11
00
00
11 11
00
00
11
000
111 00
11 00
11
000
111 00
11 000
111
111
000 11
00 111
000
000
111
000
111
000
111 00
11
00
11 000
111
Given a bipartite graph on (n, n) vertices one can check if the graph has
a complete matching in polynomial time [153]. However there is no known
polynomial time algorithm to count the number of perfect matchings.
Problems such as the marriage problem appear to require a number of
arithmetic operations that grows exponentially with the size of the data in
order to solve them, however a proposed solution can be verified by per-
forming a number of arithmetic operations that grows polynomialy with the
size of the data. Such problems are said to be of class NP. (See Chapter
13 for precise definitions.)
Form an incidence matrix X = (xij ) for a bipartite graph by letting
the upper index correspond to one set of nodes and the lower index the
other. One then places a 1 in the (i, j)-th slot if there is an edge joining the
corresponding nodes and a zero if there is not.
Define the permanent of an n n matrix X = (xij ) by
X
(1.4.2) permn (X) := x1(1) x2(2) xn(n) ,
Sn
throwing i for the first and s for the second is simply pi qs . We may form a
6 20 matrix x = (xi,s ) = (pi qs ) recording all
Pthe possible throws with their
probabilities. Note xi,s 0 for all i, s and i,s xi,s = 1. The matrix x has
an additional property: x has rank one.
Were the events not independent we would notPhave this additional
constraint. Consider the set {T R6 R20 | Ti,s 0, i,s Ti,s = 1}. This is
the set of all discrete probability distributions on R6 R20 , and the set of the
previous paragraph is this set intersected with the set of rank one matrices.
Now say some gamblers were cheating with sets of dice, each with
different probabilities. They watch to see how bets are made and then
choose one of the sets accordingly. Now we have probabilities pi,u , qs,u , and
a 6 20 array zi,s,u with rank(z) = 1, in the sense that if we consider z
as a bilinear map, it has rank one.
Say that we cannot observe the betting. Then, to obtain the probabilities
of what we can observe, we must sum over all the P possibilities. We end
up with an element of R6 R20 , with entries ri,s = u pi,u qi,u . That is, we
obtain a 6 20 matrix of probabilities of rank (at most) , i.e., an element of
,R6R20 . The set of all such distributions is the set of matrices of R6 R20
of rank at most intersected with P D6,20 .
This is an example of a Bayesian network . In general, one associates a
graph to a collection of random variables having various conditional depen-
dencies and then from such a graph, one defines sets (varieties) of distribu-
tions. More generally an algebraic statistical model is the intersection of the
probability distributions with a closed subset defined by some dependence
relations. Algebraic statistical models are discussed in Chapter 14.
A situation discussed in detail in Chapter 14 are algebraic statistical
models arising in phylogeny: Given a collection of species, say humans,
monkeys, gorillas, orangutans, and ...., all of which are assumed to have
evolved from some common ancestor, ideally we might like to reconstruct the
corresponding evolutionary tree from sampling DNA. Assuming we can only
measure the DNA of existing species, this will not be completely possible,
but it might be possible to, e.g., determine which pairs are most closely
related.
One might imagine, given the numerous possibilities for evolutionary
trees, that there would be a horrific amount of varieties to find equations
for. A major result of E. Allmann and J. Rhodes states that this is not the
case:
Theorem 1.5.1.1. [10, 9] Equations for the algebraic statistical model
associated to any bifurcating evolutionary tree can be determined explicitly
from equations for 4,C4 C4 C4 .
1.5. Algebraic Statistics and tensor networks 23
Just as phylogenetic trees, and more generally Bayes models, use graphs
to construct varieties in spaces of tensors that are useful for the problem at
hand, in physics one uses graphs to construct varieties in spaces of tensors
that model the feasible states. The precise recipe is given in 14.1, where
I also discuss geometric interpretations of the tensor network states arising
from chains, trees and loops. The last one is important for physics; large
loops are referred to as 1-D systems with periodic boundary conditions
in the physics literature and are the prime objects people use in practical
simulations today.
To entice the reader uninterested in physics, but perhaps interested in
complexity, here is a sample result:
Proposition 1.5.2.1. [213] Tensor networks associated to graphs that are
triangles consist of matrix multiplication (up to relabeling) and its degener-
ations
See Proposition 14.1.4.1 for a more precise statement. Proposition 1.5.2.1
leads to a surprising connection between the study of tensor network states
and the geometric complexity theory program mentioned above and dis-
cussed in 13.6.
x^3
Multilinear algebra
27
28 2. Multilinear algebra
Observe that
ker(w) = ,
Image(w) = hwi.
(1) Show that if one chooses bases of V and W , the matrix representing
w has rank one.
(2) Show that every rank one n m matrix is the product of a col-
umn vector with a row vector. To what extent is this presentation
unique?
(3) Show that a nonzero matrix has rank one if and only if all its 2 2
minors are zero.
(4) Show that the following definitions of the rank of a linear map
f : V W are equivalent
(a) dim f (V )
(b) dim V dim(ker(f ))
(c) The smallest r such that f is the sum of r rank one linear
maps.
(d) The smallest r such that any matrix representing f has all size
r + 1 minors zero.
(e) There exist
choices
of bases in V and W such that the matrix
Idr 0
of f is where the blocks in the previous expression
0 0
come from writing dim V = r + (dim V r) and dim W =
r + (dim W r) and Idr is the r r identity matrix.
(5) Given a linear subspace U V , define U V , the annihilator of
U , by U := { V | (u) = 0 u U }. Show that (U ) = U .
(6) Show that for a linear map f : V W , that ker f = (Image f T ) .
(See 2.9.1.6 for the definition of the transpose f T : W V .)
2.1. Rust removal exercises 29
2.2.1. The group GL(V ). If one fixes a reference basis, GL(V ) is the
group of changes of bases of V . If we use our reference basis to identify V
with Cv equipped with its standard basis, GL(V ) may be identified with
the set of invertible v v matrices. I sometimes write GL(V ) = GLv or
GLv C if V is v-dimensional and comes equipped with a basis. I emphasize
GL(V ) as a group rather than the invertible v v matrices because it not
only acts on V , but on many other spaces constructed from V .
Definition 2.2.1.1. Let W be a vector space, let G be a group, and let
: G GL(W ) be a group homomorphism (see 2.9.2.4). (In particular,
(G) is a subgroup of GL(W ).) A group homomorphism : G GL(W )
is called a (linear) representation of G. One says G acts on W , or that W
is a G-module.
2.2.3. Exercises.
(1) Let Sn denote the group of permutations of {1, . . . , n} (see Defini-
tion 2.9.2.2). Endow Cn with a basis. Show that the action of Sn
on Cn defined by permuting basis elements, i.e., given Sn and
a basis e1 , . . . , en , ej = e(j) , is not irreducible.
(2) Show that the action of GLn on Cn is irreducible.
(3) Show that the map GLp GLq GLpq given by (A, B) acting on
a p q matrix X by X 7 AXB 1 is a linear representation.
(4) Show that the action of the group of invertible upper triangular
matrices on Cn is not irreducible.
(5) Let Z denote the set of rank one p q matrices inside the vector
space of p q matrices. Show Z is invariant under the action of
GLp GLq GLpq .
2.3.1. Definitions.
Notation 2.3.1: Let V W denote the vector space of linear maps V W .
With this notation, V W denotes the linear maps V W .
Definition 2.3.1.4. Define the multilinear rank (sometimes called the du-
plex rank or Tucker rank ) of T V1 Vn to be the n-tuple of natural
numbers Rmultlin (T ) := (dim T (V1 ), . . . , dim T (Vn )).
34 2. Multilinear algebra
2.3.2. Exercises.
(1) Write out the slices of the 2 2 matrix multiplication operator
M ABC = (U V )(V W )(W U ) with respect to the
basis a1 = u1 v1 , a2 = u1 v2 , a3 = u2 v1 , a4 = u2 v2 of A and
the analogous bases for B, C.
(2) Verify that the space of multilinear functions (2.3.2) is a vector
space.
(3) Given V , W , allow V W to act on V W by,
for v V, w W , (vw) = (v)(w) and extending linearly.
Show that this identification defines an isomorphism V W =
(V W ) .
(4) Show that V C V .
(5) Show that for each I {1, . . . , k} with complementary index set
I c , that there are canonical identifications of V1 Vk with the
space of multi-linear maps Vi1 Vi|I| Vic Vic .
1 k|I|
2.4. The rank and border rank of a tensor 35
Note that the property of having rank one is independent of any choices
of basis. (The vectors in each space used to form the rank one tensor will
usually not be basis vectors, but linear combinations of them.)
Definition 2.4.1.2. Define the rank of a tensor T VP
1 V2 . . . Vk , de-
noted R(T ), to be the minimal number r such that T = ru=1 Zu with each
Zu rank one.
By Exercise 2.4.2.(3), one sees that numbers that coincided for linear
maps fail to coincide for tensors, i.e., the analog of the fundamental theorem
of linear algebra is false for tensors.
(2.4.2) g (v1 vd ) = (g v1 ) (g vd )
Strassens algorithm is
M2,2,2 =(11 + 22 )(11 + 22 )(c11 + c22 )
+ (21 + 22 )11 (c21 c22 )
+ 11 (21 22 )(c12 + c22 )
(2.4.4) + 22 (11 + 12 )(c21 + c11 )
+ (11 + 12 )22 (c11 + c12 )
+ (11 + 21 )(11 + 21 )c22
+ (12 22 )(12 + 22 )c11 .
Exercise 2.4.4.1: Verify that (2.4.4) and (2.4.3) are indeed the same ten-
sors.
Remark 2.4.4.2. To present Strassens algorithm this way, solve for the
coefficients of the vector equation set each Roman numeral in (1.1.1) to a
linear combination of the cij and set the sum of the terms equal to (2.4.3).
point for the sequence must be in the zero set. In our study of tensors of a
given rank r, we will also study limits of such tensors.
Consider the tensor
(2.4.6) T = a1 b1 c1 + a1 b1 c2 + a1 b2 c1 + a2 b1 c1 .
One can show that the rank of T is three, but T can be approximated as
closely as one likes by tensors of rank two, as consider:
1
(2.4.7) T () = [( 1)a1 b1 c1 + (a1 + a2 )(b1 + b2 )(c1 + c2 )]
Definition 2.4.5.1. A tensor T has border rank r if it is a limit of tensors
of rank r but is not a limit of tensors of rank s for any s < r. Let R(T )
denote the border rank of T .
a1 b1 c1
2.5.3. Another GL(V )-invariant tensor. Recall from above that as ten-
sors Con, tr and IdV are the same. In Chapter 6 we will see IdV and its
scalar multiples are the only GL(V )-invariant tensors in V V . The space
V V V V = End(V V ), in addition to the identity map IdV V , has
another GL(V )-invariant tensor. As a linear map it is simply
(2.5.2) : V V V V
ab 7 ba.
2 V := span{vi vj vj vi , 1 i, j n}
= span{vw wv |v, w V },
= {X V V | X(, ) = X(, ) , V }
= {X V V | X = X}
are respectively the spaces of symmetric and skew-symmetric 2-tensors of V .
In the fourth lines we are considering X as a map V V C. The second
description of these spaces implies that if T S 2 V and g GL(V ), then
2.6. Symmetric and skew-symmetric tensors 41
(using (2.4.2)) g T S 2 V and similarly for 2 V . That is, they are invariant
under linear changes of coordinates, i.e., they are GL(V )-submodules of V 2 .
For v1 , v2 V , define v1 v2 := 21 (v1 v2 + v2 v1 ) S 2 V and v1 v2 :=
1 2
2 (v1 v2 v2 v1 ) V .
2.6.2. Exercises.
(1) Show that the four descriptions of S 2 V all agree. Do the same for
the four descriptions of 2 V .
(2) Show that
(2.6.1) V V = S 2 V 2 V.
By the remarks above, this direct sum decomposition is invariant
under the action of GL(V ), c.f., Exercise 2.1.12. One says V 2
decomposes as a GL(V ) module to 2 V S 2 V .
(3) Show that the action of GL2 on C3 of Example 2.2.1.2.5 is the
action induced on S 2 C2 from the action on C2 C2 .
(4) Show that no proper linear subspace of S 2 V is invariant under the
action of GL(V ); i.e., S 2 V is an irreducible submodule of V 2 .
(5) Show that 2 V is an irreducible GL(V )-submodule of V 2 .
(6) Define maps
(2.6.2) S : V 2 V 2
1
X 7 (X + X )
2
(2.6.3) : V 2 V 2
1
X 7 (X X )
2
Show S (V 2 ) = S 2 V and (V 2 ) = 2 V .
(7) What is ker S ?
Notational Warning. Above I used as composition. It is also used
in the literature to denote symmetric product as defined below. To avoid
confusion I reserve for composition of maps with the exception of taking
the symmetric product of spaces, e.g., S d V S V = S d+ V .
S d V := S (V d )
Note
S d V = {X V d | S (X) = X}
(2.6.4) = {X V d | X = X Sd }.
Here [k] = {1, . . . , k}. Since Q and Q are really the same object, I generally
will not distinguish them by different notation.
Example 2.6.4.1. For a cubic polynomial in two variables P (s, t), one
obtains the cubic form
1
P ((s1 , t1 ), (s2 , t2 ), (s3 , t3 )) = [P (s1 + s2 + s3 , t1 + t2 + t3 )
6
P (s1 + s2 , t1 + t2 ) P (s1 + s3 , t1 + t3 ) P (s2 + s3 , t2 + t3 )
+ P (s1 , t1 ) + P (s2 , t2 ) + P (s3 , t3 )
Remark 2.6.6.4. Exercise 2.6.6.3 provides a test for symmetric tensor bor-
der rank that dates back to Macaulay [221].
More generally one can define the partially symmetric rank of partially
symmetric tensors. We will not dwell much on this since this notion will be
superceded by the notion of X-rank in Chapter 5. The term INDSCAL is
used for the partially symmetric rank of elements of S 2 W V .
(2.6.7) : V k V k
1 X
(2.6.8) v1 vk 7 v1 vk := (sgn())v(1) v(k)
k!
Sk
k V = {X V k | X = sgn()X Sk }.
(2.6.9) V V k V k1
(, v1 vk ) 7 (v1 )v2 vk .
Here we could have just as well defined contractions on any of the factors.
This contraction preserves the subspaces of symmetric and skew-symmetric
tensors, as you verify in Exercise 2.6.10.(2.6.13).
2.6.10. Exercises.
(1) Show that the subspace k V V k is invariant under the action
of GL(V ).
(2) Show a basis of V induces a basis of k V . Using this induced
basis, show that, if dim V = v, then dim k V = vk . In particular,
v V C, l V = 0 for l > v and, S 3 V 3 V 6= V 3 when v > 1.
(3) Calculate, for V , (v1 vk ) explicitly and show that it
indeed is an element of S k1 V , and similarly for (v1 vk ).
(4) Show that the composition ( ) ( ) : k V k2 V is the zero
map.
(5) Show that if V = A B then there is an induced direct sum
decomposition k V = k A (k1 A1 B) (k2 A2 B)
k B as a GL(A) GL(B)-module.
(6) Show that a subspace A V determines a well defined induced
filtration of k V given by k A k1 A1 V k2 A2 V
k V . If PA := {g GL(V ) | g v A v A}, then each
filtrand is a PA -submodule.
46 2. Multilinear algebra
(7) Show that if V is equipped with a volume form, i.e. an nonzero ele-
ment v V , then one obtains an identification k V vk V .
(8) Show that V v1 V v V as GL(V )-modules.
2.6.11. Induced linear maps. Tensor product and the symmetric and
skew-symmetric constructions are functorial. This essentially means: given
a linear map f : V W there are induced linear maps f k : V k W k
given by f k (v1 vk ) = f (v1 ) f (vk ). These restrict to give well
defined maps f k : k V k W and f k : S k V S k W .
Definition 2.6.11.1. Given a linear map f : V V , the induced map
f v : v V v V is called the determinant of f .
Example 2.6.11.2. Let 2 2 2
C have basis e1 , e2 . Say f : C C is represented
a b
by the matrix with respect to this basis, i.e., f (e1 ) = ae1 + be2 ,
c d
f (e2 ) = ce1 + de2 . Then
f (e1 e2 ) = (ae1 + be2 ) (ce1 + de2 )
= (ad bc)e1 e2
(3) Show that the eigenvalues of f k are the k-th elementary symmetric
functions of the eigenvalues of f .
(4) Given f : V V , f v is a map from a one-dimensional vector
space to itself, and thus multiplication by some scalar. Show that
if one chooses a basis for V and represents f by a matrix, the scalar
representing f v is the determinant of the matrix representing f .
(5) Assume W = V and that V admits a basis of eigenvectors for f .
Show that k V admits a basis of eigenvectors for f k and find the
eigenvectors and eigenvalues for f k in terms of those for f . In
particular show that the k-th coefficient of the characteristic poly-
nomial of f is (1)k trace (f k ) where trace is defined in Exercise
2.3.2.7.
(6) Let f : V W be invertible, with dim V = dim W = v. Verify
that f v1 = f 1 det(f ) as asserted above.
(7) Fix det v V . Let
Show that SL(V ) is a group, called the Special Linear group. Show
that if one fixes a basis of V 1 , . . . , v such that det = 1
v , and uses this basis and its dual to express linear maps
V V as v v matrices, that SL(V ) becomes the set of matrices
with determinant one (where one takes the usual determinant of
matrices).
(8) Given n-dimensional vector spaces E, F , fix an element n E n F .
Since dim(n E n F ) = 1, is unique up to scale. Then given
a linear map f : V W , one may write f n = cf , for some con-
stant cf . Show that if one chooses bases e1 , . . . , en of E, f1 , . . . , fn
of F such that = e1 en f1 fn , and expresses f as
a matrix Mf with respect to these bases, then cf = det(Mf ).
(9) Note that determines a vector n En F by h , i = 1.
Recall that f : V W determines a linear map f T : W V .
Use to define detf T . Show detf = detf T .
(10) If E = F then a volume form is not needed to define detf . Show
that in this case detf is the product of the eigenvalues of f .
48 2. Multilinear algebra
X = i,s xi,s ai bs corresponds to the matrix whose (i, s)-th entry is xi,s :
X X
xx= xij xkl ei ej ek el = x(1)(2) x(3)(4) e1 e2 e3 e4 .
ijkl S4
2.8. Decomposition of V 3
When d > 2 there are subspaces of V d other than the completely symmetric
and skew-symmetric tensors that are invariant under changes of bases. These
spaces will be studied in detail in Chapter 6. In this section, as a preview,
I consider V 3 .
Change the previous notation of S : V 3 V 3 and : V 3 V 3
to 1 2 3 and 1 respectively.
2
3
Define the projections
1 : V V V 2 V V
2
1
v1 v2 v3 7 (v1 v2 v3 v2 v1 v3 )
2
1 3 : V V V V S 2 V
1
v1 v2 v3 7
(v1 v2 v3 + v3 v2 v1 )
2
which are also endomorphisms of V 3 . Composing them, gives
1 3 = 1 3 1 : V V V S 1 3 V
2 2 2
2.8.1. Exercises.
(1) Show that the sequence
S 1 2 V V 2 V 3 V
3
is exact.
(2) Show that S 1 2 V S 1 3 V = (0).
3 2
(3) Show that there is a direct sum decomposition:
(2.8.1) V 3 = S 3 V S 1 3 V 3 V S 1 2 V
2 3
= S1 2 3V S1 2V S1V S1 3V
3 2 2
3
Exercise 2.8.2.8: Show that for d > 3 the kernel of the last nonzero map
in Exercise 2.8.1(8) and the kernel of the second map give rise to different
generalizations of S21 V .
2.8.3. Exercises.
(1) Show that there is an invariant decomposition
S 3 (AB) = (S 3 AS 3 B) S (S 3 1 AS 3 1 B) 3 A3 B
2 2
as a GL(A) GL(B)-module.
(2) Decompose 3 (AB) as a GL(A) GL(B)-module.
(3) Let S 1 2 A denote the kernel of the map S 2 AA S 3 A, so it
3
represents a copy of the module S21 A in A3 .
Show that if R S 1 2 A, then R(u, v, w) = R(v, u, w) for all
3
u, v, w A and
1
(2.8.3) R(u, v, u) = R(u, u, v) u, v, A .
2
For example, a vector space is a group with the operation +, and the
identity element 0.
One of the most important groups is the permutation group:
Definition 2.9.2.2 (The group Sn ). Given a collection of n ordered objects,
the set of permutations of the objects forms a group, called the symmetric
group on n elements or the permutation group, and is denoted Sn .
where 1 = f = p0 + p1 + + pd1 d1 + d .
Then in each Vj , in the complement of the subspace generated by the
images of vj,1 under gj , take a vector vj,2 such that the space hgjs vj,2 |
P
s Ni has maximal dimension. Now let w2 = vj,2 and consider the
corresponding space W2 . The minimal polynomial of f restricted to W2 ,
call it 2 , divides 1 , and one obtains a matrix of the same form as above
with respect to 2 representing f restricted to W2 . One continues in this
fashion. (For u > mj , the contribution of Vj to the vector generating wu
is zero.) The polynomials u are called the invariant divisors of f and are
independent of choices.
Note that u+1 divides u and that, ignoring 1 1 blocks, the maximum
number of Jordan blocks of size at least two associated to any eigenvalue of
f is the number of invariant divisors.
Rational canonical form is described in [133, VI.6], also see, e.g., [125,
7.4]. For a terse description of rational canonical form via the Jordan form,
see [248, Ex. 7.4.8].
Wiring diagrams may be used to represent many tensors and tensor op-
erations including contractions. Such diagrams date at least back to Clifford
and were used by Feynman and Penrose [260] in physics, Cvitanovic [102]
(who calls them birdtracks) to study representation theory via invariant ten-
sors, Kuperberg [195], Bar-Natan and Kontsevich [16], and Reshetikhin
and Turaev [271] in knot theory/quantum groups, Deligne [108] and Vogel
[322, 321] in their proposed categorical generalizations of Lie algebras, and
many others.
2.11. Appendix: Wiring diagrams 59
f
@ABC
GFED @ABC
GFED
T
Output: element of A AA
(a) (b)
Input: element of A A A C
O O
f
@ABC
GFED f
@ABC
GFED f
@ABC
GFED
O O
Output: element of A C AA
A A A A C 7 f AA
A A A A C
O O
O O
A A C AA
IdA : A A IdA : A A Tr : AA C 7 Id AA
Exercise 2.11.0.7: Show that the following diagram represents the scalar
C
C
In particular, if f : V V is a projection, then the diagram represents
dim Image(f ). (Compare with Exercise 2.3.2.(9).)
should be interpreted as the scalar dim V . Similarly, since IdV d has trace
equal to (dim V )d , the union of d disjoint circles should be interpreted as
the scalar (dim V )d .
Below I will take formal sums of diagrams, and under such a formal sum
one adds scalars and tensors.
Recall the linear map
: V V V V
ab 7 ba.
It has the wiring diagram in Figure 2.11.6.
Exercise 2.11.1.2: Show pictorially that trace = dim V by composing
the picture with with the picture for IdV V . (Of course one can also
62 2. Multilinear algebra
V V
V V
obtain the result by considering the matrix of with respect to the basis
ei ej .)
Input: element of (A B) (B C)
.
& ..
. ...
4 .
8 o
...
8 ..
.O ..
5-
O
.
..
..
..
.
Output: element of A C
Exercise 2.11.2.1: Show that the diagram in Figure 2.11.7 agrees with the
matrix multiplication you know and love.
Encode the tensor S of 2.6.2 with the white box shorthand on the left
hand side. It is one half the formal sum of the wiring diagrams for and
IdV 2 , as on the right hand side of Figure 2.11.8.
Encode S : V d V d by a diagram as in Figure 2.11.10 with d
strands. This is d!1 times the formal sum of diagrams corresponding to all
permutations. Recall that since each permutation is a product of transposi-
2.11. Appendix: Wiring diagrams 63
V V
1
:= 2( + )
V V
1
Figure 2.11.8. The wiring diagram 2
( + IdV 2 ).
1
2( + )
n + n2
V V V V V
V V V V V
2.11.3. Exercises.
(1) Reprove that S : V 2 V 2 is a projection operator by showing
that the concatenation of two wiring diagrams for S yields the
diagram of S .
(2) Show that dim(S 2 V ) = n+1
2 pictorially by using the picture 2.11.9.
64 2. Multilinear algebra
V V V
V V V
2 2 3 2 3
1 1
1 2 1 3 3 1 3 2 2 3 2 1
3 2 2 1 1 3