Linalg I

Linear Algebra for Computer Vision,
Robotics, and Machine Learning
Jean Gallier and Jocelyn Quaintance

Department of Computer and Information Science
University of Pennsylvania
Philadelphia, PA 19104, USA
e-mail: [email protected]
© Jean Gallier
November 14, 2023

2
Preface
In recent years, computer vision, robotics, machine learning, and data science have been
some of the key areas that have contributed to major advances in technology. Anyone who
looks at papers or books in the above areas will be ba✏ed by a strange jargon involving exotic
terms such as kernel PCA, ridge regression, lasso regression, support vector machines (SVM),
Lagrange multipliers, KKT conditions, etc. Do support vector machines chase cattle to catch
them with some kind of super lasso? No! But one will quickly discover that behind the jargon
which always comes with a new field (perhaps to keep the outsiders out of the club), lies a
lot of “classical” linear algebra and techniques from optimization theory. And there comes
the main challenge: in order to understand and use tools from machine learning, computer
vision, and so on, one needs to have a firm background in linear algebra and optimization
theory. To be honest, some probablity theory and statistics should also be included, but we
already have enough to contend with.
Many books on machine learning struggle with the above problem. How can one under-
stand what are the dual variables of a ridge regression problem if one doesn’t know about the
Lagrangian duality framework? Similarly, how is it possible to discuss the dual formulation
of SVM without a firm understanding of the Lagrangian framework?
The easy way out is to sweep these difficulties under the rug. If one is just a consumer
of the techniques we mentioned above, the cookbook recipe approach is probably adequate.
But this approach doesn’t work for someone who really wants to do serious research and
make significant contributions. To do so, we believe that one must have a solid background
in linear algebra and optimization theory.
This is a problem because it means investing a great deal of time and energy studying
these fields, but we believe that perseverance will be amply rewarded.
Our main goal is to present fundamentals of linear algebra and optimization theory,
keeping in mind applications to machine learning, robotics, and computer vision. This work
consists of two volumes, the first one being linear algebra, the second one optimization theory
and applications, especially to machine learning.
This first volume covers “classical” linear algebra, up to and including the primary de-
composition and the Jordan form. Besides covering the standard topics, we discuss a few
topics that are important for applications. These include:
1. Haar bases and the corresponding Haar wavelets.
2. Hadamard matrices.
3
4
3. Affine maps (see Section 5.5).
4. Norms and matrix norms (Chapter 8).
5. Convergence of sequences and series in a normed vector space. The matrix exponential
eA and its basic properties (see Section 8.8).
6. The group of unit quaternions, SU(2), and the representation of rotations in SO(3)
by unit quaternions (Chapter 15).
7. An introduction to algebraic and spectral graph theory.
8. Applications of SVD and pseudo-inverses, in particular, principal component analysis,

for short PCA (Chapter 21).
9. Methods for computing eigenvalues and eigenvectors, with a main focus on the QR
algorithm (Chapter 17).
Four topics are covered in more detail than usual. These are
1. Duality (Chapter 10).
2. Dual norms (Section 13.7).
3. The geometry of the orthogonal groups O(n) and SO(n), and of the unitary groups
U(n) and SU(n).
4. The spectral theorems (Chapter 16).
Except for a few exceptions we provide complete proofs. We did so to make this book
self-contained, but also because we believe that no deep knowledge of this material can be
acquired without working out some proofs. However, our advice is to skip some of the proofs
upon first reading, especially if they are long and intricate.
The chapters or sections marked with the symbol ~ contain material that is typically
more specialized or more advanced, and they can be omitted upon first (or second) reading.
Acknowledgement: We would like to thank Christine Allen-Blanchette, Kostas Daniilidis,
Carlos Esteves, Spyridon Leonardos, Stephen Phillips, João Sedoc, Stephen Shatz, Jianbo
Shi, Marcelo Siqueira, and C.J. Taylor for reporting typos and for helpful comments. Mary
Pugh and William Yu (at the University of Toronto) taught a course using our book and
reported a number of typos and errors. We warmly thank them as well as their students,
not only for finding errors, but also for very hepful comments and suggestions for simplifying
some proofs. Special thanks to Gilbert Strang. We learned much from his books which have
been a major source of inspiration. Thanks to Steven Boyd and James Demmel whose books
have been an invaluable source of information. The first author also wishes to express his
deepest gratitute to Philippe G. Ciarlet who was his teacher and mentor in 1970-1972 while
he was a student at ENPC in Paris. Professor Ciarlet was by far his best teacher. He also
5
knew how to instill in his students the importance of intellectual rigor, honesty, and modesty.
He still has his typewritten notes on measure theory and integration, and on numerical linear
algebra. The latter became his wonderful book Ciarlet [14], from which we have borrowed
heavily.
Contents
1 Introduction 13
2 Vector Spaces, Bases, Linear Maps 17

2.1 Motivations: Linear Combinations, Linear Independence, Rank . . . . . . . 17
2.2 Vector Spaces . . . . . . . . . . . . .P . . . . . . . . . . . . . . . . . . . . . . 29
2.3 Indexed Families; the Sum Notation i2I ai . . . . . . . . . . . . . . . . . . 37
2.4 Linear Independence, Subspaces . . . . . . . . . . . . . . . . . . . . . . . . 42
2.5 Bases of a Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.6 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.7 Linear Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.8 Linear Forms and the Dual Space . . . . . . . . . . . . . . . . . . . . . . . . 69
2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.10 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3 Matrices and Linear Maps 81

3.1 Representation of Linear Maps by Matrices . . . . . . . . . . . . . . . . . . 81
3.2 Composition of Linear Maps and Matrix Multiplication . . . . . . . . . . . 86
3.3 Change of Basis Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.4 The E↵ect of a Change of Bases on Matrices . . . . . . . . . . . . . . . . . 95
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4 Haar Bases, Haar Wavelets, Hadamard Matrices 107

4.1 Introduction to Signal Compression Using Haar Wavelets . . . . . . . . . . 107
4.2 Haar Matrices, Scaling Properties of Haar Wavelets . . . . . . . . . . . . . . 109
4.3 Kronecker Product Construction of Haar Matrices . . . . . . . . . . . . . . 114
4.4 Multiresolution Signal Analysis with Haar Bases . . . . . . . . . . . . . . . 116
4.5 Haar Transform for Digital Images . . . . . . . . . . . . . . . . . . . . . . . 119
4.6 Hadamard Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5 Direct Sums, Rank-Nullity Theorem, Affine Maps 133

5.1 Direct Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6
CONTENTS 7
5.2 Sums and Direct Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

5.3 Matrices of Linear Maps and Multiplication by Blocks . . . . . . . . . . . . 139
5.4 The Rank-Nullity Theorem; Grassmann’s Relation . . . . . . . . . . . . . . 152
5.5 Affine Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
6 Determinants 175
6.1 Permutations, Signature of a Permutation . . . . . . . . . . . . . . . . . . . 175
6.2 Alternating Multilinear Maps . . . . . . . . . . . . . . . . . . . . . . . . . . 180
6.3 Definition of a Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
6.4 Inverse Matrices and Determinants . . . . . . . . . . . . . . . . . . . . . . . 192
6.5 Systems of Linear Equations and Determinants . . . . . . . . . . . . . . . . 195
6.6 Determinant of a Linear Map . . . . . . . . . . . . . . . . . . . . . . . . . . 198
6.7 The Cayley–Hamilton Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 198
6.8 Permanents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
6.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
6.10 Further Readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
6.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
7 Gaussian Elimination, LU, Cholesky, Echelon Form 215

7.1 Motivating Example: Curve Interpolation . . . . . . . . . . . . . . . . . . . 215
7.2 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
7.3 Elementary Matrices and Row Operations . . . . . . . . . . . . . . . . . . . 224
7.4 LU -Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
7.5 P A = LU Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
7.6 Proof of Theorem 7.5 ~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
7.7 Dealing with Roundo↵ Errors; Pivoting Strategies . . . . . . . . . . . . . . . 246
7.8 Gaussian Elimination of Tridiagonal Matrices . . . . . . . . . . . . . . . . . 248
7.9 SPD Matrices and the Cholesky Decomposition . . . . . . . . . . . . . . . . 250
7.10 Reduced Row Echelon Form . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
7.11 RREF, Free Variables, Homogeneous Systems . . . . . . . . . . . . . . . . . 265
7.12 Uniqueness of RREF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
7.13 Solving Linear Systems Using RREF . . . . . . . . . . . . . . . . . . . . . . 270
7.14 Elementary Matrices and Columns Operations . . . . . . . . . . . . . . . . 276
7.15 Transvections and Dilatations ~ . . . . . . . . . . . . . . . . . . . . . . . . 277
7.16 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
7.17 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
8 Vector Norms and Matrix Norms 295

8.1 Normed Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
8.2 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
8.3 Subordinate Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
8 CONTENTS
8.4 Inequalities Involving Subordinate Norms . . . . . . . . . . . . . . . . . . . 319

8.5 Condition Numbers of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 321
8.6 An Application of Norms: Inconsistent Linear Systems . . . . . . . . . . . . 330
8.7 Limits of Sequences and Series . . . . . . . . . . . . . . . . . . . . . . . . . 331
8.8 The Matrix Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
8.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
8.10 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
9 Iterative Methods for Solving Linear Systems 345

9.1 Convergence of Sequences of Vectors and Matrices . . . . . . . . . . . . . . 345
9.2 Convergence of Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . 348
9.3 Methods of Jacobi, Gauss–Seidel, and Relaxation . . . . . . . . . . . . . . . 350
9.4 Convergence of the Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
9.5 Convergence Methods for Tridiagonal Matrices . . . . . . . . . . . . . . . . 361
9.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
9.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
10 The Dual Space and Duality 369

10.1 The Dual Space E ⇤ and Linear Forms . . . . . . . . . . . . . . . . . . . . . 369
10.2 Pairing and Duality Between E and E ⇤ . . . . . . . . . . . . . . . . . . . . 376
10.3 The Duality Theorem and Some Consequences . . . . . . . . . . . . . . . . 381
10.4 The Bidual and Canonical Pairings . . . . . . . . . . . . . . . . . . . . . . . 386
10.5 Hyperplanes and Linear Forms . . . . . . . . . . . . . . . . . . . . . . . . . 388
10.6 Transpose of a Linear Map and of a Matrix . . . . . . . . . . . . . . . . . . 389
10.7 Properties of the Double Transpose . . . . . . . . . . . . . . . . . . . . . . . 394
10.8 The Four Fundamental Subspaces . . . . . . . . . . . . . . . . . . . . . . . 396
10.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
10.10 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
11 Euclidean Spaces 403

11.1 Inner Products, Euclidean Spaces . . . . . . . . . . . . . . . . . . . . . . . . 403
11.2 Orthogonality and Duality in Euclidean Spaces . . . . . . . . . . . . . . . . 412
11.3 Adjoint of a Linear Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
11.4 Existence and Construction of Orthonormal Bases . . . . . . . . . . . . . . 422
11.5 Linear Isometries (Orthogonal Transformations) . . . . . . . . . . . . . . . . 429
11.6 The Orthogonal Group, Orthogonal Matrices . . . . . . . . . . . . . . . . . 432
11.7 The Rodrigues Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
11.8 QR-Decomposition for Invertible Matrices . . . . . . . . . . . . . . . . . . . 437
11.9 Some Applications of Euclidean Geometry . . . . . . . . . . . . . . . . . . . 442
11.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
11.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
12 QR-Decomposition for Arbitrary Matrices 457

CONTENTS 9
12.1 Orthogonal Reflections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457

12.2 QR-Decomposition Using Householder Matrices . . . . . . . . . . . . . . . . 462
12.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
12.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
13 Hermitian Spaces 479

13.1 Hermitian Spaces, Pre-Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . 479
13.2 Orthogonality, Duality, Adjoint of a Linear Map . . . . . . . . . . . . . . . 488
13.3 Linear Isometries (Also Called Unitary Transformations) . . . . . . . . . . . 493
13.4 The Unitary Group, Unitary Matrices . . . . . . . . . . . . . . . . . . . . . 495
13.5 Hermitian Reflections and QR-Decomposition . . . . . . . . . . . . . . . . . 498
13.6 Orthogonal Projections and Involutions . . . . . . . . . . . . . . . . . . . . 503
13.7 Dual Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
13.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
13.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
14 Eigenvectors and Eigenvalues 519

14.1 Eigenvectors and Eigenvalues of a Linear Map . . . . . . . . . . . . . . . . . 519
14.2 Reduction to Upper Triangular Form . . . . . . . . . . . . . . . . . . . . . . 527
14.3 Location of Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
14.4 Conditioning of Eigenvalue Problems . . . . . . . . . . . . . . . . . . . . . . 535
14.5 Eigenvalues of the Matrix Exponential . . . . . . . . . . . . . . . . . . . . . 537
14.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
14.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
15 Unit Quaternions and Rotations in SO(3) 551

15.1 The Group SU(2) and the Skew Field H of Quaternions . . . . . . . . . . . 551
15.2 Representation of Rotation in SO(3) By Quaternions in SU(2) . . . . . . . 553
15.3 Matrix Representation of the Rotation rq . . . . . . . . . . . . . . . . . . . 558
15.4 An Algorithm to Find a Quaternion Representing a Rotation . . . . . . . . 560
15.5 The Exponential Map exp : su(2) ! SU(2) . . . . . . . . . . . . . . . . . . 563
15.6 Quaternion Interpolation ~ . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
15.7 Nonexistence of a “Nice” Section from SO(3) to SU(2) . . . . . . . . . . . . 568
15.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570
15.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571
16 Spectral Theorems 575

16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575
16.2 Normal Linear Maps: Eigenvalues and Eigenvectors . . . . . . . . . . . . . . 575
16.3 Spectral Theorem for Normal Linear Maps . . . . . . . . . . . . . . . . . . . 581
16.4 Self-Adjoint and Other Special Linear Maps . . . . . . . . . . . . . . . . . . 586
16.5 Normal and Other Special Matrices . . . . . . . . . . . . . . . . . . . . . . . 592
16.6 Rayleigh–Ritz Theorems and Eigenvalue Interlacing . . . . . . . . . . . . . 595
10 CONTENTS
16.7 The Courant–Fischer Theorem; Perturbation Results . . . . . . . . . . . . . 600

16.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603
16.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604
17 Computing Eigenvalues and Eigenvectors 611

17.1 The Basic QR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613
17.2 Hessenberg Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619
17.3 Making the QR Method More Efficient Using Shifts . . . . . . . . . . . . . 625
17.4 Krylov Subspaces; Arnoldi Iteration . . . . . . . . . . . . . . . . . . . . . . 630
17.5 GMRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634
17.6 The Hermitian Case; Lanczos Iteration . . . . . . . . . . . . . . . . . . . . . 635
17.7 Power Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636
17.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 638
17.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
18 Graphs and Graph Laplacians; Basic Facts 641

18.1 Directed Graphs, Undirected Graphs, Weighted Graphs . . . . . . . . . . . 644
18.2 Laplacian Matrices of Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 651
18.3 Normalized Laplacian Matrices of Graphs . . . . . . . . . . . . . . . . . . . 655
18.4 Graph Clustering Using Normalized Cuts . . . . . . . . . . . . . . . . . . . 659
18.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661
18.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662
19 Spectral Graph Drawing 665

19.1 Graph Drawing and Energy Minimization . . . . . . . . . . . . . . . . . . . 665
19.2 Examples of Graph Drawings . . . . . . . . . . . . . . . . . . . . . . . . . . 668
19.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672
20 Singular Value Decomposition and Polar Form 675

20.1 Properties of f ⇤ f . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675
20.2 Singular Value Decomposition for Square Matrices . . . . . . . . . . . . . . 681
20.3 Polar Form for Square Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 685
20.4 Singular Value Decomposition for Rectangular Matrices . . . . . . . . . . . 687
20.5 Ky Fan Norms and Schatten Norms . . . . . . . . . . . . . . . . . . . . . . 691
20.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 692
20.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 692
21 Applications of SVD and Pseudo-Inverses 697

21.1 Least Squares Problems and the Pseudo-Inverse . . . . . . . . . . . . . . . . 697
21.2 Properties of the Pseudo-Inverse . . . . . . . . . . . . . . . . . . . . . . . . 704
21.3 Data Compression and SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . 709
21.4 Principal Components Analysis (PCA) . . . . . . . . . . . . . . . . . . . . . 711
21.5 Best Affine Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 722
CONTENTS 11
21.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726

21.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727
22 Annihilating Polynomials; Primary Decomposition 731

22.1 Basic Properties of Polynomials; Ideals, GCD’s . . . . . . . . . . . . . . . . 733
22.2 Annihilating Polynomials and the Minimal Polynomial . . . . . . . . . . . . 738
22.3 Minimal Polynomials of Diagonalizable Linear Maps . . . . . . . . . . . . . 739
22.4 Commuting Families of Linear Maps . . . . . . . . . . . . . . . . . . . . . . 742
22.5 The Primary Decomposition Theorem . . . . . . . . . . . . . . . . . . . . . 745
22.6 Jordan Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 752
22.7 Nilpotent Linear Maps and Jordan Form . . . . . . . . . . . . . . . . . . . . 754
22.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 760
22.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 761
Bibliography 763
12 CONTENTS
Chapter 1
Introduction
As we explained in the preface, this first volume covers “classical” linear algebra, up to and
including the primary decomposition and the Jordan form. Besides covering the standard
topics, we discuss a few topics that are important for applications. These include:
1. Haar bases and the corresponding Haar wavelets, a fundamental tool in signal process-
ing and computer graphics.
2. Hadamard matrices which have applications in error correcting codes, signal processing,
and low rank approximation.
3. Affine maps (see Section 5.5). These are usually ignored or treated in a somewhat
obscure fashion. Yet they play an important role in computer vision and robotics.
There is a clean and elegant way to define affine maps. One simply has to define affine
combinations. Linear maps preserve linear combinations, and similarly affine maps
preserve affine combinations.
4. Norms and matrix norms (Chapter 8). These are used extensively in optimization
theory.
5. Convergence of sequences and series in a normed vector space. Banach spaces (see
Section 8.7). The matrix exponential eA and its basic properties (see Section 8.8).
In particular, we prove the Rodrigues formula for rotations in SO(3) and discuss the
surjectivity of the exponential map exp : so(3) ! SO(3), where so(3) is the real vector
space of 3⇥3 skew symmetric matrices (see Section 11.7). We also show that det(eA ) =
etr(A) (see Section 14.5).
6. The group of unit quaternions, SU(2), and the representation of rotations in SO(3)
by unit quaternions (Chapter 15). We define a homomorphism r : SU(2) ! SO(3)
and prove that it is surjective and that its kernel is { I, I}. We compute the rota-
tion matrix Rq associated with a unit quaternion q, and give an algorithm to con-
struct a quaternion from a rotation matrix. We also show that the exponential map
13
14 CHAPTER 1. INTRODUCTION
exp : su(2) ! SU(2) is surjective, where su(2) is the real vector space of skew-
Hermitian 2 ⇥ 2 matrices with zero trace. We discuss quaternion interpolation and
prove the famous slerp interpolation formula due to Ken Shoemake.
7. An introduction to algebraic and spectral graph theory. We define the graph Laplacian
and prove some of its basic properties (see Chapter 18). In Chapter 19, we explain
how the eigenvectors of the graph Laplacian can be used for graph drawing.
8. Applications of SVD and pseudo-inverses, in particular, principal component analysis,

for short PCA (Chapter 21).
9. Methods for computing eigenvalues and eigenvectors are discussed in Chapter 17. We
first focus on the QR algorithm due to Rutishauser, Francis, and Kublanovskaya. See
Sections 17.1 and 17.3. We then discuss how to use an Arnoldi iteration, in combination
with the QR algorithm, to approximate eigenvalues for a matrix A of large dimension.
See Section 17.4. The special case where A is a symmetric (or Hermitian) tridiagonal
matrix, involves a Lanczos iteration, and is discussed in Section 17.6. In Section 17.7,
we present power iterations and inverse (power) iterations.
Five topics are covered in more detail than usual. These are
1. Matrix factorizations such as LU , P A = LU , Cholesky, and reduced row echelon form

(rref). Deciding the solvablity of a linear system Ax = b, and describing the space of
solutions when a solution exists. See Chapter 7.
2. Duality (Chapter 10).
3. Dual norms (Section 13.7).
4. The geometry of the orthogonal groups O(n) and SO(n), and of the unitary groups
U(n) and SU(n).
5. The spectral theorems (Chapter 16).
Most texts omit the proof that the P A = LU factorization can be obtained by a simple
modification of Gaussian elimination. We give a complete proof of Theorem 7.5 in Section
7.6. We also prove the uniqueness of the rref of a matrix; see Proposition 7.19.
At the most basic level, duality corresponds to transposition. But duality is really the
bijection between subspaces of a vector space E (say finite-dimensional) and subspaces of
linear forms (subspaces of the dual space E ⇤ ) established by two maps: the first map assigns
to a subspace V of E the subspace V 0 of linear forms that vanish on V ; the second map assigns
to a subspace U of linear forms the subspace U 0 consisting of the vectors in E on which all
linear forms in U vanish. The above maps define a bijection such that dim(V ) + dim(V 0 ) =
dim(E), dim(U ) + dim(U 0 ) = dim(E), V 00 = V , and U 00 = U .
15
Another important fact is that if E is a finite-dimensional space with an inner product

u, v 7! hu, vi (or a Hermitian inner product if E is a complex vector space), then there is a
canonical isomorphism between E and its dual E ⇤ . This means that every linear form f 2 E ⇤
is uniquely represented by some vector u 2 E, in the sense that f (v) = hv, ui for all v 2 E.
As a consequence, every linear map f has an adjoint f ⇤ such that hf (u), vi = hu, f ⇤ (v)i for
all u, v 2 E.
Dual norms show up in convex optimization; see Boyd and Vandenberghe [11].
Because of their importance in robotics and computer vision, we discuss in some detail
the groups of isometries O(E) and SO(E) of a vector space with an inner product. The
isometries in O(E) are the linear maps such that f f ⇤ = f ⇤ f = id, and the direct
isometries in SO(E), also called rotations, are the isometries in O(E) whose determinant is
equal to +1. We also discuss the hermitian counterparts U(E) and SU(E).
We prove the spectral theorems not only for real symmetric matrices, but also for real
and complex normal matrices.
We stress the importance of linear maps. Matrices are of course invaluable for computing
and one needs to develop skills for manipulating them. But matrices are used to represent
a linear map over a basis (or two bases), and the same linear map has di↵erent matrix
representations. In fact, we can view the various normal forms of a matrix (Schur, SVD,
Jordan) as a suitably convenient choice of bases.
We have listed most of the Matlab functions relevant to numerical linear algebra and
have included Matlab programs implementing most of the algorithms discussed in this book.
16 CHAPTER 1. INTRODUCTION
Chapter 2
Vector Spaces, Bases, Linear Maps
2.1 Motivations: Linear Combinations, Linear Inde-

pendence and Rank
In linear optimization problems, we often encounter systems of linear equations. For example,
consider the problem of solving the following system of three linear equations in the three
variables x1 , x2 , x3 2 R:
x1 + 2x2 x3 = 1
2x1 + x2 + x3 = 2
x1 2x2 2x3 = 3.
One way to approach this problem is introduce the “vectors” u, v, w, and b, given by
0 1 0 1 0 1 0 1
1 2 1 1
u= @ 2 A v= @ 1 A w= @ 1 A b = 2A
@
1 2 2 3
and to write our linear system as
x1 u + x2 v + x3 w = b.
In the above equation, we used implicitly the fact that a vector z can be multiplied by a
scalar 2 R, where 0 1 0 1
z1 z1
z = @z 2 A = @ z 2 A ,
z3 z3
and two vectors y and and z can be added, where
0 1 0 1 0 1
y1 z1 y1 + z 1
y + z = @y2 A + @z2 A = @y2 + z2 A .
y3 z3 y3 + z 3
17
18 CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
Also, given a vector 0 1

x1
x = @x 2 A ,
x3
we define the additive inverse x of x (pronounced minus x) as
0 1
x1
x = @ x2 A .
x3
Observe that x = ( 1)x, the scalar multiplication of x by 1.

The set of all vectors with three components is denoted by R3⇥1 . The reason for using
the notation R3⇥1 rather than the more conventional notation R3 is that the elements of
R3⇥1 are column vectors; they consist of three rows and a single column, which explains the
superscript 3 ⇥ 1. On the other hand, R3 = R ⇥ R ⇥ R consists of all triples of the form
(x1 , x2 , x3 ), with x1 , x2 , x3 2 R, and these are row vectors. However, there is an obvious
bijection between R3⇥1 and R3 and they are usually identified. For the sake of clarity, in
this introduction, we will denote the set of column vectors with n components by Rn⇥1 .
An expression such as
x1 u + x2 v + x3 w
where u, v, w are vectors and the xi s are scalars (in R) is called a linear combination. Using
this notion, the problem of solving our linear system
x1 u + x2 v + x3 w = b.
is equivalent to determining whether b can be expressed as a linear combination of u, v, w.

Now if the vectors u, v, w are linearly independent, which means that there is no triple
(x1 , x2 , x3 ) 6= (0, 0, 0) such that
x 1 u + x 2 v + x 3 w = 03 ,
it can be shown that every vector in R3⇥1 can be written as a linear combination of u, v, w.
Here, 03 is the zero vector 0 1
0
0 3 = @0 A .
0
It is customary to abuse notation and to write 0 instead of 03 . This rarely causes a problem
because in most cases, whether 0 denotes the scalar zero or the zero vector can be inferred
from the context.
In fact, every vector z 2 R3⇥1 can be written in a unique way as a linear combination
z = x1 u + x2 v + x3 w.
2.1. MOTIVATIONS: LINEAR COMBINATIONS, LINEAR INDEPENDENCE, RANK19
This is because if
z = x1 u + x2 v + x3 w = y1 u + y2 v + y3 w,
then by using our (linear!) operations on vectors, we get
(y1 x1 )u + (y2 x2 )v + (y3 x3 )w = 0,
which implies that

y1 x1 = y2 x2 = y3 x3 = 0,
by linear independence. Thus,
y1 = x1 , y2 = x 2 , y3 = x 3 ,
which shows that z has a unique expression as a linear combination, as claimed. Then our
equation
x1 u + x2 v + x3 w = b
has a unique solution, and indeed, we can check that
x1 = 1.4
x2 = 0.4
x3 = 0.4
is the solution.
But then, how do we determine that some vectors are linearly independent?
One answer is to compute a numerical quantity det(u, v, w), called the determinant of
(u, v, w), and to check that it is nonzero. In our case, it turns out that
1 2 1
det(u, v, w) = 2 1 1 = 15,
1 2 2
which confirms that u, v, w are linearly independent.

Other methods, which are much better for systems with a large number of variables,
consist of computing an LU-decomposition or a QR-decomposition, or an SVD of the matrix
consisting of the three columns u, v, w,
0 1
1 2 1
A = u v w = @2 1 1 A.
1 2 2
If we form the vector of unknowns 0 1

x1
x = @x 2 A ,
x3
then our linear combination x1 u + x2 v + x3 w can be written in matrix form as

0 10 1
1 2 1 x1
x 1 u + x 2 v + x 3 w = @2 1 1 A @x 2 A ,
1 2 2 x3
so our linear system is expressed by

0 10 1 0 1
1 2 1 x1 1
@2 1 1 A @ x2 = 2 A ,
A @
1 2 2 x3 3
or more concisely as
Ax = b.
Now what if the vectors u, v, w are linearly dependent? For example, if we consider the
vectors
0 1 0 1 0 1
1 2 1
u= @ 2 A v= @ 1 A w= @ 1 A,
1 1 2
we see that
u v = w,
a nontrivial linear dependence. It can be verified that u and v are still linearly independent.
Now for our problem
x1 u + x2 v + x3 w = b
it must be the case that b can be expressed as linear combination of u and v. However,
it turns out that u, v, b are linearly independent (one way to see this is to compute the
determinant det(u, v, b) = 6), so b cannot be expressed as a linear combination of u and v
and thus, our system has no solution.
If we change the vector b to 0 1
3
b = 3A ,
@
0
then
b = u + v,
and so the system
x1 u + x2 v + x3 w = b
has the solution
x1 = 1, x2 = 1, x3 = 0.
Actually, since w = u v, the above system is equivalent to

(x1 + x3 )u + (x2 x3 )v = b,
and because u and v are linearly independent, the unique solution in x1 + x3 and x2 x3 is
x1 + x3 = 1
x2 x3 = 1,
which yields an infinite number of solutions parameterized by x3 , namely
x1 = 1 x3
x2 = 1 + x3 .
In summary, a 3 ⇥ 3 linear system may have a unique solution, no solution, or an infinite

number of solutions, depending on the linear independence (and dependence) or the vectors
u, v, w, b. This situation can be generalized to any n ⇥ n system, and even to any n ⇥ m
system (n equations in m variables), as we will see later.
The point of view where our linear system is expressed in matrix form as Ax = b stresses
the fact that the map x 7! Ax is a linear transformation. This means that
A( x) = (Ax)
for all x 2 R3⇥1 and all 2 R and that
A(u + v) = Au + Av,
for all u, v 2 R3⇥1 . We can view the matrix A as a way of expressing a linear map from R3⇥1
to R3⇥1 and solving the system Ax = b amounts to determining whether b belongs to the
image of this linear map.
Given a 3 ⇥ 3 matrix 0 1
a11 a12 a13
A = @a21 a22 a23 A ,
a31 a32 a33
whose columns are three vectors denoted A1 , A2 , A3 , and given any vector x = (x1 , x2 , x3 ),
we defined the product Ax as the linear combination
0 1
a11 x1 + a12 x2 + a13 x3
Ax = x1 A1 + x2 A2 + x3 A3 = @a21 x1 + a22 x2 + a23 x3 A .
a31 x1 + a32 x2 + a33 x3
The common pattern is that the ith coordinate of Ax is given by a certain kind of product
called an inner product, of a row vector , the ith row of A, times the column vector x:
0 1
x1
ai1 ai2 ai3 · @x2 A = ai1 x1 + ai2 x2 + ai3 x3 .
x3
More generally, given any two vectors x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) 2 Rn , their
inner product denoted x · y, or hx, yi, is the number
0 1
y1
B y2 C X n
B C
x · y = x1 x2 · · · xn · B .. C = xi yi .
@ . A i=1
yn
Inner products play a very important role. First, the quantity

p
kxk2 = x · x = (x21 + · · · + x2n )1/2
is a generalization of the length of a vector, called the Euclidean norm, or `2 -norm. Second,
it can be shown that we have the inequality
|x · y|  kxk kyk ,
so if x, y 6= 0, the ratio (x · y)/(kxk kyk) can be viewed as the cosine of an angle, the angle
between x and y. In particular, if x · y = 0 then the vectors x and y make the angle ⇡/2,
that is, they are orthogonal . The (square) matrices Q that preserve the inner product, in
the sense that hQx, Qyi = hx, yi for all x, y 2 Rn , also play a very important role. They can
be thought of as generalized rotations.
Returning to matrices, if A is an m ⇥ n matrix consisting of n columns A1 , . . . , An (in
Rm ), and B is a n ⇥ p matrix consisting of p columns B 1 , . . . , B p (in Rn ) we can form the p
vectors (in Rm )
AB 1 , . . . , AB p .
These p vectors constitute the m ⇥ p matrix denoted AB, whose jth column is AB j . But
we know that the ith coordinate of AB j is the inner product of the ith row of A by the jth
column of B, 0 1
b1j
B b2j C X n
B C
ai1 ai2 · · · ain · B .. C = aik bkj .
@ . A k=1
bnj
Thus we have defined a multiplication operation on matrices, namely if A = (aik ) is a m ⇥ n
matrix and if B = (bjk ) if n ⇥ p matrix, then their product AB is the m ⇥ n matrix whose
entry on the ith row and the jth column is given by the inner product of the ith row of A
by the jth column of B,
X n
(AB)ij = aik bkj .
k=1
Beware that unlike the multiplication of real (or complex) numbers, if A and B are two n ⇥ n
matrices, in general, AB 6= BA.
Suppose that A is an n ⇥ n matrix and that we are trying to solve the linear system
Ax = b,
with b 2 Rn . Suppose we can find an n ⇥ n matrix B such that
BAi = ei , i = 1, . . . , n,
with ei = (0, . . . , 0, 1, 0 . . . , 0), where the only nonzero entry is 1 in the ith slot. If we form
the n ⇥ n matrix 0 1
1 0 0 ··· 0 0
B0 1 0 · · · 0 0C
B C
B0 0 1 · · · 0 0C
B C
In = B .. .. .. . . .. .. C ,
B. . . . . .C
B C
@0 0 0 · · · 1 0A
0 0 0 ··· 0 1
called the identity matrix , whose ith column is ei , then the above is equivalent to
BA = In .
If Ax = b, then multiplying both sides on the left by B, we get
B(Ax) = Bb.
But is is easy to see that B(Ax) = (BA)x = In x = x, so we must have
x = Bb.
We can verify that x = Bb is indeed a solution, because it can be shown that
A(Bb) = (AB)b = In b = b.
What is not obvious is that BA = In implies AB = In , but this is indeed provable. The
matrix B is usually denoted A 1 and called the inverse of A. It can be shown that it is the
unique matrix such that
AA 1 = A 1 A = In .
If a square matrix A has an inverse, then we say that it is invertible or nonsingular , otherwise
we say that it is singular . We will show later that a square matrix is invertible i↵ its columns
are linearly independent i↵ its determinant is nonzero.
In summary, if A is a square invertible matrix, then the linear system Ax = b has the
unique solution x = A 1 b. In practice, this is not a good way to solve a linear system because
computing A 1 is too expensive. A practical method for solving a linear system is Gaussian
elimination, discussed in Chapter 7. Other practical methods for solving a linear system
Ax = b make use of a factorization of A (QR decomposition, SVD decomposition), using

orthogonal matrices defined next.
Given an m ⇥ n matrix A = (akl ), the n ⇥ m matrix A> = (a> ij ) whose ith row is the
>
ith column of A, which means that aij = aji for i = 1, . . . , n and j = 1, . . . , m, is called the
transpose of A. An n ⇥ n matrix Q such that
QQ> = Q> Q = In
is called an orthogonal matrix . Equivalently, the inverse Q 1 of an orthogonal matrix Q is

equal to its transpose Q> . Orthogonal matrices play an important role. Geometrically, they
correspond to linear transformation that preserve length. A major result of linear algebra
states that every m ⇥ n matrix A can be written as
A = V ⌃U > ,
where V is an m ⇥ m orthogonal matrix, U is an n ⇥ n orthogonal matrix, and ⌃ is an m ⇥ n

matrix whose only nonzero entries are nonnegative diagonal entries 1 2 ··· p,
>
where p = min(m, n), called the singular values of A. The factorization A = V ⌃U is called
a singular decomposition of A, or SVD.
The SVD can be used to “solve” a linear system Ax = b where A is an m ⇥ n matrix,
even when this system has no solution. This may happen when there are more equations
than variables (m > n) , in which case the system is overdetermined.
Of course, there is no miracle, an unsolvable system has no solution. But we can look
for a good approximate solution, namely a vector x that minimizes some measure of the
error Ax b. Legendre and Gauss used kAx bk22 , which is the squared Euclidean norm
of the error. This quantity is di↵erentiable, and it turns out that there is a unique vector
x+ of minimum Euclidean norm that minimizes kAx bk22 . Furthermore, x+ is given by the
expression x+ = A+ b, where A+ is the pseudo-inverse of A, and A+ can be computed from
an SVD A = V ⌃U > of A. Indeed, A+ = U ⌃+ V > , where ⌃+ is the matrix obtained from ⌃
by replacing every positive singular value i by its inverse i 1 , leaving all zero entries intact,
and transposing.
Instead of searching for the vector of least Euclidean norm minimizing kAx bk22 , we
can add the penalty term K kxk22 (for some positive K > 0) to kAx bk22 and minimize the
quantity kAx bk22 + K kxk22 . This approach is called ridge regression. It turns out that
there is a unique minimizer x+ given by x+ = (A> A + KIn ) 1 A> b, as shown in the second
volume.
Another approach is to replace the penalty term K kxk22 by K kxk1 , where kxk1 = |x1 | +
· · · + |xn | (the `1 -norm of x). The remarkable fact is that the minimizers x of kAx bk22 +
K kxk1 tend to be sparse, which means that many components of x are equal to zero. This
approach known as lasso is popular in machine learning and will be discussed in the second
volume.
Another important application of the SVD is principal component analysis (or PCA), an
important tool in data analysis.
Yet another fruitful way of interpreting the resolution of the system Ax = b is to view
this problem as an intersection problem. Indeed, each of the equations
x1 + 2x2 x3 = 1
2x1 + x2 + x3 = 2
x1 2x2 2x3 = 3
defines a subset of R3 which is actually a plane. The first equation
x1 + 2x2 x3 = 1
defines the plane H1 passing through the three points (1, 0, 0), (0, 1/2, 0), (0, 0, 1), on the
coordinate axes, the second equation
2x1 + x2 + x3 = 2
defines the plane H2 passing through the three points (1, 0, 0), (0, 2, 0), (0, 0, 2), on the coor-
dinate axes, and the third equation
x1 2x2 2x3 = 3
defines the plane H3 passing through the three points (3, 0, 0), (0, 3/2, 0), (0, 0, 3/2), on
the coordinate axes. See Figure 2.1.
2x1+ 2x2- x3= 1
x1-2x2-2x3= 3
2x1+ x2+ x3= 2
Figure 2.1: The planes defined by the preceding linear equations.

2x1+ 2x2- x3= 1
2x1+ x2+ x3= 2
x1-2x2-2x3= 3
(1.4, -0.4, -0.4)
Figure 2.2: The solution of the system is the point in common with each of the three planes.
The intersection Hi \Hj of any two distinct planes Hi and Hj is a line, and the intersection
H1 \ H2 \ H3 of the three planes consists of the single point (1.4, 0.4, 0.4), as illustrated
in Figure 2.2.
The planes corresponding to the system
x1 + 2x2 x3 = 1
2x1 + x2 + x3 = 2
x1 x2 + 2x3 = 3,
are illustrated in Figure 2.3.
2x1+ 2x2- x3= 1
x1- x 2+2x3= 3
1 2 3
2x1+ x2+ x3= 2
Figure 2.3: The planes defined by the equations x1 + 2x2 x3 = 1, 2x1 + x2 + x3 = 2, and
x1 x2 + 2x3 = 3.
This system has no solution since there is no point simultaneously contained in all three
planes; see Figure 2.4.
2x1+ x2+ x3= 2

2x1+ 2x2- x 3= 1
x1- x 2+2x3= 3
Figure 2.4: The linear system x1 + 2x2 x3 = 1, 2x1 + x2 + x3 = 2, x1 x2 + 2x3 = 3 has

no solution.
Finally, the planes corresponding to the system
x1 + 2x2 x3 = 3
2x1 + x2 + x3 = 3
x1 x2 + 2x3 = 0,
are illustrated in Figure 2.5.
2x1 + 2x2 - x3= 3
x 1- x2+ 2x3= 0
1 2 3
2x1+ x2+ x 3= 3
Figure 2.5: The planes defined by the equations x1 + 2x2 x3 = 3, 2x1 + x2 + x3 = 3, and
x1 x2 + 2x3 = 0.
This system has infinitely many solutions, given parametrically by (1 x3 , 1 + x3 , x3 ).

Geometrically, this is a line common to all three planes; see Figure 2.6.
2x1+ x2+ x 3= 3 2x1 + 2x2 - x3= 3
x 1- x2+ 2x3= 0
Figure 2.6: The linear system x1 + 2x2 x3 = 3, 2x1 + x2 + x3 = 3, x1 x2 + 2x3 = 0 has

the red line common to all three planes.
Under the above interpretation, observe that we are focusing on the rows of the matrix
A, rather than on its columns, as in the previous interpretations.
Another great example of a real-world problem where linear algebra proves to be very
e↵ective is the problem of data compression, that is, of representing a very large data set
using a much smaller amount of storage.
Typically the data set is represented as an m ⇥ n matrix A where each row corresponds
to an n-dimensional data point and typically, m n. In most applications, the data are not
independent so the rank of A is a lot smaller than min{m, n}, and the the goal of low-rank
decomposition is to factor A as the product of two matrices B and C, where B is a m ⇥ k
matrix and C is a k ⇥ n matrix, with k ⌧ min{m, n} (here, ⌧ means “much smaller than”):
0 1 0 1
B C B C
B C B C0 1
B C B C
B C B C
B A C=B B C@ C A
B C B C
B m⇥n C B m⇥k C k⇥n
B C B C
@ A @ A
Now it is generally too costly to find an exact factorization as above, so we look for a
low-rank matrix A0 which is a “good” approximation of A. In order to make this statement
precise, we need to define a mechanism to determine how close two matrices are. This can
be done using matrix norms, a notion discussed in Chapter 8. The norm of a matrix A is a
2.2. VECTOR SPACES 29
nonnegative real number kAk which behaves a lot like the absolute value |x| of a real number
x. Then our goal is to find some low-rank matrix A0 that minimizes the norm
2
kA A0 k ,
over all matrices A0 of rank at most k, for some given k ⌧ min{m, n}.
Some advantages of a low-rank approximation are:
1. Fewer elements are required to represent A; namely, k(m + n) instead of mn. Thus
less storage and fewer operations are needed to reconstruct A.
2. Often, the process for obtaining the decomposition exposes the underlying structure of
the data. Thus, it may turn out that “most” of the significant data are concentrated
along some directions called principal directions.
Low-rank decompositions of a set of data have a multitude of applications in engineering,
including computer science (especially computer vision), statistics, and machine learning.
As we will see later in Chapter 21, the singular value decomposition (SVD) provides a very
satisfactory solution to the low-rank approximation problem. Still, in many cases, the data
sets are so large that another ingredient is needed: randomization. However, as a first step,
linear algebra often yields a good initial solution.
We will now be more precise as to what kinds of operations are allowed on vectors. In
the early 1900, the notion of a vector space emerged as a convenient and unifying framework
for working with “linear” objects and we will discuss this notion in the next few sections.
2.2 Vector Spaces

A (real) vector space is a set E together with two operations, + : E ⇥E ! E and · : R⇥E !
E, called addition and scalar multiplication, that satisfy some simple properties. First of all,
E under addition has to be a commutative (or abelian) group, a notion that we review next.
However, keep in mind that vector spaces are not just algebraic
objects; they are also geometric objects.
Definition 2.1. A group is a set G equipped with a binary operation · : G ⇥ G ! G that
associates an element a · b 2 G to every pair of elements a, b 2 G, and having the following
properties: · is associative, has an identity element e 2 G, and every element in G is invertible
(w.r.t. ·). More explicitly, this means that the following equations hold for all a, b, c 2 G:
(G1) a · (b · c) = (a · b) · c. (associativity);
(G2) a · e = e · a = a. (identity);
1 1 1
(G3) For every a 2 G, there is some a 2 G such that a · a =a · a = e. (inverse).
A group G is abelian (or commutative) if
a · b = b · a for all a, b 2 G.
A set M together with an operation · : M ⇥ M ! M and an element e satisfying only

Conditions (G1) and (G2) is called a monoid .
For example, the set N = {0, 1, . . . , n, . . .} of natural numbers is a (commutative) monoid
under addition with identity element 0. However, it is not a group.
Some examples of groups are given below.
Example 2.1.
1. The set Z = {. . . , n, . . . , 1, 0, 1, . . . , n, . . .} of integers is an abelian group under
addition, with identity element 0. However, Z⇤ = Z {0} is not a group under
multiplication; it is a commutative monoid with identity element 1.
2. The set Q of rational numbers (fractions p/q with p, q 2 Z and q 6= 0) is an abelian

group under addition, with identity element 0. The set Q⇤ = Q {0} is also an abelian
group under multiplication, with identity element 1.
3. Similarly, the sets R of real numbers and C of complex numbers are abelian groups
under addition (with identity element 0), and R⇤ = R {0} and C⇤ = C {0} are
abelian groups under multiplication (with identity element 1).
4. The sets Rn and Cn of n-tuples of real or complex numbers are abelian groups under
componentwise addition:
(x1 , . . . , xn ) + (y1 , . . . , yn ) = (x1 + y1 , . . . , xn + yn ),
with identity element (0, . . . , 0).
5. Given any nonempty set S, the set of bijections f : S ! S, also called permutations
of S, is a group under function composition (i.e., the multiplication of f and g is the
composition g f ), with identity element the identity function idS . This group is not
abelian as soon as S has more than two elements.
6. The set of n ⇥ n matrices with real (or complex) coefficients is an abelian group under
addition of matrices, with identity element the null matrix. It is denoted by Mn (R)
(or Mn (C)).
7. The set R[X] of all polynomials in one variable X with real coefficients,
P (X) = an X n + an 1 X n 1
+ · · · + a1 X + a0 ,
(with ai 2 R), is an abelian group under addition of polynomials. The identity element
is the zero polynomial.
8. The set of n ⇥ n invertible matrices with real (or complex) coefficients is a group under
matrix multiplication, with identity element the identity matrix In . This group is
called the general linear group and is usually denoted by GL(n, R) (or GL(n, C)).
9. The set of n ⇥ n invertible matrices with real (or complex) coefficients and determinant
+1 is a group under matrix multiplication, with identity element the identity matrix
In . This group is called the special linear group and is usually denoted by SL(n, R)
(or SL(n, C)).
10. The set of n ⇥ n invertible matrices with real coefficients such that RR> = R> R = In
and of determinant +1 is a group (under matrix multiplication) called the special
orthogonal group and is usually denoted by SO(n) (where R> is the transpose of the
matrix R, i.e., the rows of R> are the columns of R). It corresponds to the rotations
in Rn .
11. Given an open interval (a, b), the set C(a, b) of continuous functions f : (a, b) ! R is
an abelian group under the operation f + g defined such that
(f + g)(x) = f (x) + g(x)
for all x 2 (a, b).
It is customary to denote the operation of an abelian group G by +, in which case the

inverse a 1 of an element a 2 G is denoted by a.
The identity element of a group is unique. In fact, we can prove a more general fact:
Proposition 2.1. For any binary operation · : M ⇥ M ! M , if e0 2 M is a left identity and
if e00 2 M is a right identity, which means that
e0 · a = a for all a2M (G2l)
and
a · e00 = a for all a 2 M, (G2r)
then e0 = e00 .
Proof. If we let a = e00 in equation (G2l), we get
e0 · e00 = e00 ,
and if we let a = e0 in equation (G2r), we get
e0 · e00 = e0 ,
and thus
e0 = e0 · e00 = e00 ,
as claimed.
Proposition 2.1 implies that the identity element of a monoid is unique, and since every
group is a monoid, the identity element of a group is unique. Furthermore, every element in
a group has a unique inverse. This is a consequence of a slightly more general fact:
Proposition 2.2. In a monoid M with identity element e, if some element a 2 M has some
left inverse a0 2 M and some right inverse a00 2 M , which means that
a0 · a = e (G3l)
and
a · a00 = e, (G3r)
then a0 = a00 .
Proof. Using (G3l) and the fact that e is an identity element, we have
(a0 · a) · a00 = e · a00 = a00 .
Similarly, Using (G3r) and the fact that e is an identity element, we have
a0 · (a · a00 ) = a0 · e = a0 .
However, since M is monoid, the operation · is associative, so
a0 = a0 · (a · a00 ) = (a0 · a) · a00 = a00 ,
as claimed.
Remark: Axioms (G2) and (G3) can be weakened a bit by requiring only (G2r) (the exis-
tence of a right identity) and (G3r) (the existence of a right inverse for every element) (or
(G2l) and (G3l)). It is a good exercise to prove that the group axioms (G2) and (G3) follow
from (G2r) and (G3r).
Another important property about inverse elements in monoids is stated below.
Proposition 2.3. In a monoid M with identity element e, if a and b are invertible elements
of M , where a 1 is the inverse of a and b 1 is the inverse of b, then ab is invertible and its
inverse is given by (ab) 1 = b 1 a 1 .
Proof. Using associativity and the fact that e is the identity element we have
(ab)(b 1 a 1 ) = a(b(b 1 a 1 )) associativity

= a((bb 1 )a 1 ) associativity
= a(ea 1 ) b 1 is the inverse of b
= aa 1 e is the identity element
= e. a 1 is the inverse of a.
We also have
(b 1 a 1 )(ab) = b 1 (a 1 (ab)) associativity

= b 1 ((a 1 a)b) associativity
= b 1 (eb) a 1 is the inverse of a
= b 1b e is the identity element
= e. b 1 is the inverse of b.
Therefore b 1 a 1
is the inverse of ab.
Observe that the inverse of ba is a 1 b 1 . Proposition 2.3 implies that the set of invertible
elements of a monoid M is a group, also with identity element e.
A vector space is an abelian group E with an additional operation · : K ⇥ E ! E called
scalar multiplication that allows rescaling a vector in E by an element in K. The set K
itself is an algebraic structure called a field . A field is a special kind of stucture called a
ring. These notions are defined below. We begin with rings.
Definition 2.2. A ring is a set A equipped with two operations + : A ⇥ A ! A (called

addition) and ⇤ : A ⇥ A ! A (called multiplication) having the following properties:
(R1) A is an abelian group w.r.t. +;
(R2) ⇤ is associative and has an identity element 1 2 A;
(R3) ⇤ is distributive w.r.t. +.
The identity element for addition is denoted 0, and the additive inverse of a 2 A is
denoted by a. More explicitly, the axioms of a ring are the following equations which hold
for all a, b, c 2 A:
a + (b + c) = (a + b) + c (associativity of +) (2.1)
a+b=b+a (commutativity of +) (2.2)
a+0=0+a=a (zero) (2.3)
a + ( a) = ( a) + a = 0 (additive inverse) (2.4)
a ⇤ (b ⇤ c) = (a ⇤ b) ⇤ c (associativity of ⇤) (2.5)
a⇤1=1⇤a=a (identity for ⇤) (2.6)
(a + b) ⇤ c = (a ⇤ c) + (b ⇤ c) (distributivity) (2.7)
a ⇤ (b + c) = (a ⇤ b) + (a ⇤ c) (distributivity) (2.8)
The ring A is commutative if
a ⇤ b = b ⇤ a for all a, b 2 A.
From (2.7) and (2.8), we easily obtain
a⇤0 = 0⇤a=0 (2.9)

a ⇤ ( b) = ( a) ⇤ b = (a ⇤ b). (2.10)
Note that (2.9) implies that if 1 = 0, then a = 0 for all a 2 A, and thus, A = {0}. The
ring A = {0} is called the trivial ring. A ring for which 1 6= 0 is called nontrivial . The
multiplication a ⇤ b of two elements a, b 2 A is often denoted by ab.
The abelian group Z is a commutative ring (with unit 1), and for any commutative ring
K, the abelian group K[X] of polynomials is also a commutative ring (also with unit 1).
The set Z/mZ of residues modulo m where m is a positive integer is a commutative ring.
A field is a commutative ring K for which K {0} is a group under multiplication.
Definition 2.3. A set K is a field if it is a ring and the following properties hold:
(F1) 0 6= 1;
(F2) For every a 2 K, if a 6= 0, then a has an inverse w.r.t. ⇤;
(F3) ⇤ is commutative.
Let K ⇤ = K {0}. Observe that (F1) and (F2) are equivalent to the fact that K ⇤ is a
group w.r.t. ⇤ with identity element 1. If ⇤ is not commutative but (F1) and (F2) hold, we
say that we have a skew field (or noncommutative field ).
Note that we are assuming that the operation ⇤ of a field is commutative. This convention
is not universally adopted, but since ⇤ will be commutative for most fields we will encounter,
we may as well include this condition in the definition.
Example 2.2.
1. The rings Q, R, and C are fields.
2. The set Z/pZ of residues modulo p where p is a prime number is field.
3. The set of (formal) fractions f (X)/g(X) of polynomials f (X), g(X) 2 R[X], where
g(X) is not the zero polynomial, is a field.
Vector spaces are defined as follows.

Definition 2.4. A real vector space is a set E (of vectors) together with two operations
+ : E ⇥ E ! E (called vector addition)1 and · : R ⇥ E ! E (called scalar multiplication)
satisfying the following conditions for all ↵, 2 R and all u, v 2 E;
The symbol + is overloaded, since it denotes both addition in the field R and addition of vectors in E.
1
It is usually clear from the context which + is intended.

(V0) E is an abelian group w.r.t. +, with identity element 0;2
(V1) ↵ · (u + v) = (↵ · u) + (↵ · v);
(V2) (↵ + ) · u = (↵ · u) + ( · u);
(V3) (↵ ⇤ ) · u = ↵ · ( · u);
(V4) 1 · u = u.
In (V3), ⇤ denotes multiplication in R.
Given ↵ 2 R and v 2 E, the element ↵ · v is also denoted by ↵v. The field R is often
called the field of scalars.
In Definition 2.4, the field R may be replaced by the field of complex numbers C, in which
case we have a complex vector space. It is even possible to replace R by the field of rational
numbers Q or by any arbitrary field K (for example Z/pZ, where p is a prime number), in
which case we have a K-vector space (in (V3), ⇤ denotes multiplication in the field K). In
most cases, the field K will be the field R of reals, but all results in this chapter hold for
vector spaces over an arbitrary field .
From (V0), a vector space always contains the null vector 0, and thus is nonempty.
From (V1), we get ↵ · 0 = 0, and ↵ · ( v) = (↵ · v). From (V2), we get 0 · v = 0, and
( ↵) · v = (↵ · v).
Another important consequence of the axioms is the following fact:
Proposition 2.4. For any u 2 E and any 2 R, if 6= 0 and · u = 0, then u = 0.

1
Proof. Indeed, since 6= 0, it has a multiplicative inverse , so from · u = 0, we get
1 1
· ( · u) = · 0.
1
However, we just observed that · 0 = 0, and from (V3) and (V4), we have
1 1
· ( · u) = ( ) · u = 1 · u = u,
and we deduce that u = 0.
Remark: One may wonder whether axiom (V4) is really needed. Could it be derived from
the other axioms? The answer is no. For example, one can take E = Rn and define
· : R ⇥ Rn ! Rn by
· (x1 , . . . , xn ) = (0, . . . , 0)
2
The symbol 0 is also overloaded, since it represents both the zero in R (a scalar) and the identity element
of E (the zero vector). Confusion rarely arises, but one may prefer using 0 for the zero vector.
for all (x1 , . . . , xn ) 2 Rn and all 2 R. Axioms (V0)–(V3) are all satisfied, but (V4) fails.
Less trivial examples can be given using the notion of a basis, which has not been defined
yet.
The field R itself can be viewed as a vector space over itself, addition of vectors being
addition in the field, and multiplication by a scalar being multiplication in the field.
Example 2.3.
1. The fields R and C are vector spaces over R.
2. The groups Rn and Cn are vector spaces over R, with scalar multiplication given by
(x1 , . . . , xn ) = ( x1 , . . . , xn ),
for any 2 R and with (x1 , . . . , xn ) 2 Rn or (x1 , . . . , xn ) 2 Cn , and Cn is a vector

space over C with scalar multiplication as above, but with 2 C.
3. The ring R[X]n of polynomials of degree at most n with real coefficients is a vector
space over R, and the ring C[X]n of polynomials of degree at most n with complex
coefficients is a vector space over C, with scalar multiplication ·P (X) of a polynomial
P (X) = am X m + am 1 X m 1
+ · · · + a1 X + a0
(with ai 2 R or ai 2 C) by the scalar (in R or C), with m  n, given by
· P (X) = am X m + am 1 X m 1
+ · · · + a1 X + a0 .
4. The ring R[X] of all polynomials with real coefficients is a vector space over R, and the
ring C[X] of all polynomials with complex coefficients is a vector space over C, with
the same scalar multiplication as above.
5. The ring of n ⇥ n matrices Mn (R) is a vector space over R.
6. The ring of m ⇥ n matrices Mm,n (R) is a vector space over R.
7. The ring C(a, b) of continuous functions f : (a, b) ! R is a vector space over R, with
the scalar multiplication f of a function f : (a, b) ! R by a scalar 2 R given by
( f )(x) = f (x), for all x 2 (a, b).
8. A very important example of vector space is the set of linear maps between two vector
spaces to be defined in Section 2.7. Here is an example that will prepare us for the
vector space of linear maps. Let X be any nonempty set and let E be a vector space.
The set of all functions f : X ! E can be made into a vector space as follows: Given
any two functions f : X ! E and g : X ! E, let (f + g) : X ! E be defined such that
(f + g)(x) = f (x) + g(x)

P
2.3. INDEXED FAMILIES; THE SUM NOTATION i2I ai 37
for all x 2 X, and for every 2 R, let f : X ! E be defined such that
( f )(x) = f (x)
for all x 2 X. The axioms of a vector space are easily verified.
Let E be a vector space. We would like to define the important notions of linear combi-
nation and linear independence.
Before defining these notions, we need to discuss a strategic choice which, depending
how it is settled, may reduce or increase headaches in dealing with notions such as linear
combinations and linear dependence (or independence). The issue has to do with using sets
of vectors versus sequences of vectors.
P
2.3 Indexed Families; the Sum Notation i2I ai
Our experience tells us that it is preferable to use sequences of vectors; even better, indexed
families of vectors. (We are not alone in having opted for sequences over sets, and we are in
good company; for example, Artin [3], Axler [4], and Lang [40] use sequences. Nevertheless,
some prominent authors such as Lax [43] use sets. We leave it to the reader to conduct a
survey on this issue.)
Given a set A, recall that a sequence is an ordered n-tuple (a1 , . . . , an ) 2 An of elements
from A, for some natural number n. The elements of a sequence need not be distinct and
the order is important. For example, (a1 , a2 , a1 ) and (a2 , a1 , a1 ) are two distinct sequences
in A3 . Their underlying set is {a1 , a2 }.
What we just defined are finite sequences, which can also be viewed as functions from
{1, 2, . . . , n} to the set A; the ith element of the sequence (a1 , . . . , an ) is the image of i under
the function. This viewpoint is fruitful, because it allows us to define (countably) infinite
sequences as functions s : N ! A. But then, why limit ourselves to ordered sets such as
{1, . . . , n} or N as index sets?
The main role of the index set is to tag each element uniquely, and the order of the tags
is not crucial, although convenient. Thus, it is natural to define the notion of indexed family.
Definition 2.5. Given a set A, an I-indexed family of elements of A, for short a family,
is a function a : I ! A where I is any set viewed as an index set. Since the function a is
determined by its graph
{(i, a(i)) | i 2 I},
the family a can be viewed as the set of pairs a = {(i, a(i)) | i 2 I}. For notational simplicity,
we write ai instead of a(i), and denote the family a = {(i, a(i)) | i 2 I} by (ai )i2I .
For example, if I = {r, g, b, y} and A = N, the set of pairs
a = {(r, 2), (g, 3), (b, 2), (y, 11)}
is an indexed family. The element 2 appears twice in the family with the two distinct tags
r and b.
When the indexed set I is totally ordered, a family (ai )i2I is often called an I-sequence.
Interestingly, sets can be viewed as special cases of families. Indeed, a set A can be viewed
as the A-indexed family {(a, a) | a 2 I} corresponding to the identity function.
Remark: An indexed family should not be confused with a multiset. Given any set A, a
multiset is a similar to a set, except that elements of A may occur more than once. For
example, if A = {a, b, c, d}, then {a, a, a, b, c, c, d, d} is a multiset. Each element appears
with a certain multiplicity, but the order of the elements does not matter. For example, a
has multiplicity 3. Formally, a multiset is a function s : A ! N, or equivalently a set of pairs
{(a, i) | a 2 A}. Thus, a multiset is an A-indexed family of elements from N, but not a
N-indexed family, since distinct elements may have the same multiplicity (such as c an d in
the example above). An indexed family is a generalization of a sequence, but a multiset is a
generalization of a set.
WePalso need to take care of an annoying technicality, which is to define sums of the
form i2I ai , where I is any finite index set and (ai )i2I is a family of elements in some set
A equiped with a binary operation + : A ⇥ A ! A which is associative (Axiom (G1)) and
commutative. This will come up when we define linear combinations.
The issue is that the binary operation + only tells us how to compute a1 + a2 for two
elements of A, but it does not tell us what is the sum of three of more elements. For example,
how should a1 + a2 + a3 be defined?
What we have to do is to define a1 +a2 +a3 by using a sequence of steps each involving two
elements, and there are two possible ways to do this: a1 + (a2 + a3 ) and (a1 + a2 ) + a3 . If our
operation + is not associative, these are di↵erent values. If it associative, then a1 +(a2 +a3 ) =
(a1 + a2 ) + a3 , but then there are still six possible permutations of the indices 1, 2, 3, and if
+ is not commutative, these values are generally di↵erent. If our operation is commutative,
then all six permutations have the same value. P Thus, if + is associative and commutative,
it seems intuitively clear that a sum of the form i2I ai does not depend on the order of the
operations used to compute it.
This is indeed the case, but a rigorous proof requires induction, and such a proof is
surprisingly
P involved. Readers may accept without proof the fact that sums of the form
i2I ai are indeed well defined, and jump directly to Definition 2.6. For those who want to
see the gory details, here we go.
P
First, we define sums i2I ai , where I is a finite sequence of distinct natural numbers,
say I = (i1 , . . . , im ). If I = (i1 , . . . , im ) with m 2, we denote the sequence (i2 , . . . , im ) by
P
I {i1 }. We proceed by induction on the size m of I. Let

X
ai = ai1 , if m = 1,
i2I
X ✓ X ◆
ai = ai 1 + ai , if m > 1.
i2I i2I {i1 }
For example, if I = (1, 2, 3, 4), we have

X
ai = a1 + (a2 + (a3 + a4 )).
i2I
If the operation + is not associative, the grouping of the terms matters. For instance, in
general
a1 + (a2 + (a3 + a4 )) 6= (a1 + a2 ) + (a3 + a4 ).
P
However, if the operation + is associative, the sum i2I ai should not depend on the grouping
of the elements in I, as long as their order is preserved. For example, if I = (1, 2, 3, 4, 5),
J1 = (1, 2), and J2 = (3, 4, 5), we expect that
X ✓X ◆ ✓X ◆
ai = aj + aj .
i2I j2J1 j2J2
This indeed the case, as we have the following proposition.

Proposition 2.5. Given any nonempty set A equipped with an associative binary operation
+ : A ⇥ A ! A, for any nonempty finite sequence I of distinct natural numbers and for
any partition of I into p nonempty sequences Ik1 , . . . , Ikp , for some nonempty sequence K =
(k1 , . . . , kp ) of distinct natural numbers such that ki < kj implies that ↵ < for all ↵ 2 Iki
and all 2 Ikj , for every sequence (ai )i2I of elements in A, we have
X X✓ X ◆
a↵ = a↵ .
↵2I k2K ↵2Ik
Proof. We proceed by induction on the size n of I.

If n = 1, then we must have p = 1 and Ik1 = I, so the proposition holds trivially.
Next, assume n > 1. If p = 1, then Ik1 = I and the formula is trivial, so assume that
p 2 and write J = (k2 , . . . , kp ). There are two cases.
Case 1. The sequence Ik1 has a single element, say , which is the first element of I.
In this case, write C for the sequence obtained from I by deleting its first element . By
definition, ✓X ◆
X
a↵ = a + a↵ ,
↵2I ↵2C
and
X✓ X ◆ ✓X ✓ X ◆◆
a↵ =a + a↵ .
k2K ↵2Ik j2J ↵2Ij
Since |C| = n 1, by the induction hypothesis, we have

✓ X ◆ X✓ X ◆
a↵ = a↵ ,
↵2C j2J ↵2Ij
which yields our identity.

Case 2. The sequence Ik1 has at least two elements. In this case, let be the first element
of I (and thus of Ik1 ), let I 0 be the sequence obtained from I by deleting its first element ,
let Ik0 1 be the sequence obtained from Ik1 by deleting its first element , and let Ik0 i = Iki for
i = 2, . . . , p. Recall that J = (k2 , . . . , kp ) and K = (k1 , . . . , kp ). The sequence I 0 has n 1
elements, so by the induction hypothesis applied to I 0 and the Ik0 i , we get
X X✓ X ◆ ✓ X ◆ ✓X✓ X ◆◆
a↵ = a↵ = a↵ + a↵ .
↵2I 0 k2K ↵2Ik0 ↵2Ik0 j2J ↵2Ij
1
If we add the lefthand side to a , by definition we get

X
a↵ .
↵2I
If we add the righthand side to a , using associativity and the definition of an indexed sum,
we get
✓✓ X ◆ ✓X✓ X ◆◆◆ ✓ ✓ X ◆◆ ✓X✓ X ◆◆
a + a↵ + a↵ = a + a↵ + a↵
↵2Ik0 j2J ↵2Ij ↵2Ik0 j2J ↵2Ij
1 1
✓X ◆ ✓X ✓ X ◆◆
= a↵ + a↵
↵2Ik1 j2J ↵2Ij
X✓ X ◆
= a↵ ,
k2K ↵2Ik
as claimed.
Pn P
If I = (1, . . . , n), we also write
Pn i=1 a i instead of i2I ai . Since + is associative, Propo-
sition 2.5 shows that the sum i=1 ai is independent of the grouping of its elements, which
justifies the use the notation a1 + · · · + an (without any parentheses).
If we also assume that
P our associative binary operation on A is commutative, then we
can show that the sum i2I ai does not depend on the ordering of the index set I.
P
Proposition 2.6. Given any nonempty set A equipped with an associative and commutative
binary operation + : A ⇥ A ! A, for any two nonempty finite sequences I and J of distinct
natural numbers such that J is a permutation of I (in other words, the underlying sets of I
and J are identical), for every sequence (ai )i2I of elements in A, we have
X X
a↵ = a↵ .
↵2I ↵2J
Proof. We proceed by induction on the number p of elements in I. If p = 1, we have I = J

and the proposition holds trivially.
If p > 1, to simplify notation, assume that I = (1, . . . , p) and that J is a permutation
(i1 , . . . , ip ) of I. First, assume that 2  i1  p 1, let J 0 be the sequence obtained from J by
deleting i1 , I 0 be the sequence obtained from I by deleting i1 , and let P = (1, 2, . . . , i1 1) and
Q = (i1 + 1, . . . , p 1, p). Observe that the sequence I 0 is the concatenation of the sequences
P and Q. By the induction hypothesis applied to J 0 and I 0 , and then by Proposition 2.5
applied to I 0 and its partition (P, Q), we have
X X ✓iX
1 1 ◆ ✓ X p ◆
a↵ = a↵ = ai + ai .
↵2J 0 ↵2I 0 i=1 i=i1 +1
If we add the lefthand side to ai1 , by definition we get

X
a↵ .
↵2J
If we add the righthand side to ai1 , we get

✓✓iX
1 1 ◆ ✓ X
p ◆◆
ai 1 + ai + ai .
i=1 i=i1 +1
Using associativity, we get

✓✓iX1 1 ◆ ✓ X
p ◆◆ ✓ ✓iX
1 1 ◆◆ ✓ X
p ◆
ai 1 + ai + ai = ai 1 + ai + ai ,
i=1 i=i1 +1 i=1 i=i1 +1
then using associativity and commutativity several times (more rigorously, using induction
on i1 1), we get
✓ ✓iX
1 1 ◆◆ ✓ X
p ◆ ✓iX 1 1 ◆ ✓ Xp ◆
ai 1 + ai + ai = ai + ai 1 + ai
i=1 i=i1 +1 i=1 i=i1 +1
p
X
= ai ,
i=1
as claimed.
The cases where i1 = 1 or i1 = p are treated similarly, but in a simpler manner since
either P = () or Q = () (where () denotes the empty sequence).
P
Having done all this, we can now make sense of sums of the form i2I ai , for any finite
indexed set I and any family a = (ai )i2I of elements in A, where A is a set equipped with a
binary operation + which is associative and commutative.
Indeed, since I is finite, it is in bijection with the set {1, . . . , n} for some n 2 N, and any
total ordering on I corresponds to a permutation I of {1, . . . , n} (where P we identify a
permutation with its image). For any total ordering on I, we define i2I, ai as
X X
ai = aj .
i2I, j2I
0
Then for any other total ordering on I, we have
X X
ai = aj ,
i2I, 0 j2I 0
and since I and I 0 are di↵erent permutations of {1, . . . , n}, by Proposition 2.6, we have
X X
aj = aj .
j2I j2I 0
P
Therefore,
P the sum i2I, ai does
P not depend on the total ordering on I. We define the sum
i2I ai as the common value i2I, ai for all total orderings of I.
Here are some examples with A = R:
p P p p
1. If I = {1, 2, 3}, a = {(1, 2), (2, 3), (3, 2)}, then i2I ai = 2 3+ 2= 1+ 2.
P p p p
2. If I = {2, 5, 7}, a = {(2, 2), (5, 3), (7, 2)}, then i2I ai = 2 3 + 2 = 1+ 2.
P
3. If I = {r, g, b}, a = {(r, 2), (g, 3), (b, 1)}, then i2I ai = 2 3 + 1 = 0.
2.4 Linear Independence, Subspaces

One of the most useful properties of vector spaces is that they possess bases. What this
means is that in every vector space E, there is some set of vectors, {e1 , . . . , en }, such that
every vector v 2 E can be written as a linear combination,
v= 1 e1 + ··· + n en ,
of the ei , for some scalars, 1, . . . , n 2 R. Furthermore, the n-tuple, ( 1 , . . . , n ), as above

is unique.
This description is fine when E has a finite basis, {e1 , . . . , en }, but this is not always the
case! For example, the vector space of real polynomials, R[X], does not have a finite basis
but instead it has an infinite basis, namely
1, X, X 2 , . . . , X n , . . .
2.4. LINEAR INDEPENDENCE, SUBSPACES 43
Given a set A, recall that an I-indexed family (ai )i2I of elements of A (for short, a family)
is a function a : I ! A, or equivalently a set of pairs {(i, ai ) | i 2 I}. We agree that when
I = ;, (ai )i2I = ;. A family (ai )i2I is finite if I is finite.
Remark: When considering a family (ai )i2I , there is no reason to assume that I is ordered.
The crucial point is that every element of the family is uniquely indexed by an element of
I. Thus, unless specified otherwise, we do not assume that the elements of an index set are
ordered.
Given two disjoint sets I and J, the union of two families (ui )i2I and (vj )j2J , denoted as
(ui )i2I [ (vj )j2J , is the family (wk )k2(I[J) defined such that wk = uk if k 2 I, and wk = vk
if k 2 J. Given a family (ui )i2I and any element v, we denote by (ui )i2I [k (v) the family
(wi )i2I[{k} defined such that, wi = ui if i 2 I, and wk = v, where k is any index such that
k2 / I. Given a family (ui )i2I , a subfamily of (ui )i2I is a family (uj )j2J where J is any subset
of I.
In this chapter, unless specified otherwise, it is assumed that all families of scalars are
finite (i.e., their index set is finite).
Definition 2.6. Let E be a vector space. A vector v 2 E is a linear combination of a family
(ui )i2I of elements of E i↵ there is a family ( i )i2I of scalars in R such that
X
v= i ui .
i2I
P
When I = ;, we stipulate that v = 0. (By Proposition 2.6, sums of the form i2I i ui are
well defined.) We say that a family (ui )i2I is linearly independent i↵ for every family ( i )i2I
of scalars in R, X
i ui = 0 implies that i = 0 for all i 2 I.
i2I
Equivalently, a family (ui )i2I is linearly dependent i↵ there is some family ( i )i2I of scalars
in R such that X
i ui = 0 and j 6= 0 for some j 2 I.
i2I
We agree that when I = ;, the family ; is linearly independent.
Observe that defining linear combinations for families of vectors rather than for sets of
vectors has the advantage that the vectors being combined need not be distinct. For example,
for I = {1, 2, 3} and the families (u, v, u) and ( 1 , 2 , 1 ), the linear combination
X
i ui = 1 u + 2 v + 1 u
i2I
makes sense. Using sets of vectors in the definition of a linear combination does not allow
such linear combinations; this is too restrictive.
Unravelling Definition 2.6, a family (ui )i2I is linearly dependent i↵ either I consists of a
single element, say i, and ui = 0, or |I| 2 and some uj in the family can be expressed as
a linear combination of the other vectors in the family. Indeed, in the second case, there is
some family ( i )i2I of scalars in R such that
X
i ui = 0 and j 6= 0 for some j 2 I,
i2I
and since |I| 2, the set I {j} is nonempty and we get

X
1
uj = j i ui .
i2(I {j})
Observe that one of the reasons for defining linear dependence for families of vectors
rather than for sets of vectors is that our definition allows multiple occurrences of a vector.
This is important because a matrix may contain identical columns, and we would like to say
that these columns are linearly dependent. The definition of linear dependence for sets does
not allow us to do that.
The above also shows that a family (ui )i2I is linearly independent i↵ either I = ;, or I
consists of a single element i and ui 6= 0, or |I| 2 and no vector uj in the family can be
expressed as a linear combination of the other vectors in the family.
When I is nonempty, if the family (ui )i2I is linearly independent, note that ui 6= 0 for
P 2 I. Otherwise, if ui = 0 for some i 2 I, then we get a nontrivial linear dependence
all i
i2I i ui = 0 by picking any nonzero i and letting k = 0 for all k 2 I with k 6= i, since
i 0 = 0. If |I| 2, we must also have ui 6= uj for all i, j 2 I with i 6= j, since otherwise we
get a nontrivial linear dependence by picking i = and j = for any nonzero , and
letting k = 0 for all k 2 I with k 6= i, j.
Thus, the definition of linear independence implies that a nontrivial linearly independent
family is actually a set. This explains why certain authors choose to define linear indepen-
dence for sets of vectors. The problem with this approach is that linear dependence, which
is the logical negation of linear independence, is then only defined for sets of vectors. How-
ever, as we pointed out earlier, it is really desirable to define linear dependence for families
allowing multiple occurrences of the same vector.
In the special case where the vectors that we are considering are the columns A1 , . . . , An
of an n ⇥ n matrix A (with coefficients in K = R or K = C), linear independence has a
simple characterization in terms of the solutions of the linear system Ax = 0.
Recall that A1 , . . . , An are linearly independent i↵ for any scalars x1 , . . . , xn 2 K,
if x1 A1 + · · · + xn An = 0, then x1 = · · · = xn = 0. (⇤1 )
If we form the column vector x whose coordinates are x1 , . . . , xn 2 K, then by definition of
Ax,
x1 A1 + · · · + xn An = Ax,
so (⇤1 ) is equivalent to
if Ax = 0, then x = 0. (⇤2 )
In other words, the columns A1 , . . . , An of the matrix A are linearly independent i↵ the linear
system Ax = 0 has the unique solution x = 0 (the trivial solution).
The above can typically be demonstrated by solving the system Ax = 0 by variable
elimination, and verifying that the only solution obtained is x = 0.
Another way to prove that the linear system Ax = 0 only has the trivial solution x = 0 is
to show that A is invertible by by finding explicity the inverse A 1 of A. Indeed, if A has an
inverse A 1 , we have A 1 A = AA 1 = I, so multiplying both sides of the equation Ax = 0
on the left by A 1 , we obtain
A 1 Ax = A 1 0 = 0,
and since A 1 Ax = Ix = x, we get x = 0.
The first method can be applied to show linear independence in (2) and (3) of the following
example.
Example 2.4.
1. Any two distinct scalars , µ 6= 0 in R are linearly dependent.
2. In R3 , the vectors (1, 0, 0), (0, 1, 0), and (0, 0, 1) are linearly independent. See Figure
2.7.
Figure 2.7: A visual (arrow) depiction of the red vector (1, 0, 0), the green vector (0, 1, 0),
and the blue vector (0, 0, 1) in R3 .
3. In R4 , the vectors (1, 1, 1, 1), (0, 1, 1, 1), (0, 0, 1, 1), and (0, 0, 0, 1) are linearly indepen-
dent.
4. In R2 , the vectors u = (1, 1), v = (0, 1) and w = (2, 3) are linearly dependent, since
w = 2u + v.
See Figure 2.8.

(2,3)
2u
Figure 2.8: A visual (arrow) depiction of the pink vector u = (1, 1), the dark purple vector
v = (0, 1), and the vector sum w = 2u + v.
When I is finite, we often assume that it is the set I = {1, 2, . . . , n}. In this case, we
denote the family (ui )i2I as (u1 , . . . , un ).
The notion of a subspace of a vector space is defined as follows.
Definition 2.7. Given a vector space E, a subset F of E is a linear subspace (or subspace)
of E i↵ F is nonempty and u + µv 2 F for all u, v 2 F , and all , µ 2 R.
It is easy to see that a subspace F of E is indeed a vector space, since the restriction
of + : E ⇥ E ! E to F ⇥ F is indeed a function + : F ⇥ F ! F , and the restriction of
· : R ⇥ E ! E to R ⇥ F is indeed a function · : R ⇥ F ! F .
Since a subspace F is nonempty, if we pick any vector u 2 F and if we let = µ = 0,
then u + µu = 0u + 0u = 0, so every subspace contains the vector 0.
The following facts also hold. The proof is left as an exercise.
Proposition 2.7.
(1) The intersection of any family (even infinite) of subspaces of a vector space E is a
subspace.
(2) Let F be any subspace of a vector space E. For any nonempty finite index set I,
P(ui )i2I is any family of vectors ui 2 F and ( i )i2I is any family of scalars, then
if
i2I i ui 2 F .
The subspace {0} will be denoted by (0), or even 0 (with a mild abuse of notation).
Example 2.5.
1. In R2 , the set of vectors u = (x, y) such that
x+y =0
is the subspace illustrated by Figure 2.9.
Figure 2.9: The subspace x + y = 0 is the line through the origin with slope 1. It consists
of all vectors of the form ( 1, 1).
2. In R3 , the set of vectors u = (x, y, z) such that
x+y+z =0
is the subspace illustrated by Figure 2.10.
3. For any n 0, the set of polynomials f (X) 2 R[X] of degree at most n is a subspace
of R[X].
4. The set of upper triangular n ⇥ n matrices is a subspace of the space of n ⇥ n matrices.
Proposition 2.8. Given any vector space E, if S is any nonempty subset of E, then the
smallest subspace hSi (or Span(S)) of E containing S is the set of all (finite) linear combi-
nations of elements from S.
Proof. We prove that the set Span(S) of all linear combinations of elements of S is a subspace
of E, leaving as an exercise the verification that every subspace containing S also contains
Span(S).
Figure 2.10: The subspace x + y + z = 0 is the plane through the origin with normal (1, 1, 1).
P
First,P
Span(S) is nonempty since it contains S (which is nonempty). If u = i2I i ui
and v = j2J µj vj are any two linear combinations in Span(S), for any two scalars , µ 2 R,
X X
u + µv = i ui + µ µj v j
i2I j2J
X X
= i ui + µµj vj
i2I j2J
X X X
= i ui + ( i ui + µµi vi ) + µµj vj ,
i2I J i2I\J j2J I
which is a linear combination with index set I [ J, and thus u + µv 2 Span(S), which
proves that Span(S) is a subspace.
One might wonder what happens if we add extra conditions to the coefficients involved
in forming linear combinations. Here are three natural restrictions which turn out to be
important (as usual, we assume that our index sets are finite):
P
(1) Consider combinations i2I i ui for which
X
i = 1.
i2I
These
P are called affine combinations. One should realize that every linear combination
i2I i ui can be viewed as an affine combination.PFor example, Pif k is an index not
in I, if we let J = I [ {k}, uk = 0, and k = 1 i2I i , then j2J j uj is an affine
combination and X X
i ui = j uj .
i2I j2J
2.5. BASES OF A VECTOR SPACE 49
However, we get new spaces. For example, in R3 , the set of all affine combinations of
the three vectors e1 = (1, 0, 0), e2 = (0, 1, 0), and e3 = (0, 0, 1), is the plane passing
through these three points. Since it does not contain 0 = (0, 0, 0), it is not a linear
subspace.
P
(2) Consider combinations i2I i ui for which
i 0, for all i 2 I.
These are called positive (or conic) combinations. It turns out that positive combina-
tions of families of vectors are cones. They show up naturally in convex optimization.
P
(3) Consider combinations i2I i ui for which we require (1) and (2), that is
X
i = 1, and i 0 for all i 2 I.
i2I
These are called convex combinations. Given any finite family of vectors, the set of all
convex combinations of these vectors is a convex polyhedron. Convex polyhedra play a
very important role in convex optimization.
Remark: The notion Pof linear combination can also be defined for infinite index sets I.
To ensure that a sum i2I i ui makes sense, we restrict our attention to families of finite
support.
Definition 2.8. Given any field K, a family of scalars ( i )i2I has finite support if i =0
for all i 2 I J, for some finite subset J of I.
If ( i )i2I is a family of scalars of finite support, for any vector space E over K,
Pfor any
of vectors ui 2 E, we define the linear combination i2I i ui
(possibly infinite) family (ui )i2I P
as the finite linear combination j2J j uj , where J is any finite subset of I such that i = 0
for all i 2 I J. In general, results stated for finite families also hold for families of finite
support.
2.5 Bases of a Vector Space

Given a vector space E, given a family (vi )i2I , the subset V of E consisting of the null vector
0 and of all linear combinations of (vi )i2I is easily seen to be a subspace of E. The family
(vi )i2I is an economical way of representing the entire subspace V , but such a family would
be even nicer if it was not redundant. Subspaces having such an “efficient” generating family
(called a basis) play an important role and motivate the following definition.
Definition 2.9. Given a vector space E and a subspace V of E, a family (vi )i2I of vectors
vi 2 V spans V or generates V i↵ for every v 2 V , there is some family ( i )i2I of scalars in
R such that X
v= i vi .
i2I
We also say that the elements of (vi )i2I are generators of V and that V is spanned by (vi )i2I ,
or generated by (vi )i2I . If a subspace V of E is generated by a finite family (vi )i2I , we say
that V is finitely generated . A family (ui )i2I that spans V and is linearly independent is
called a basis of V .
Example 2.6.
1. In R3 , the vectors (1, 0, 0), (0, 1, 0), and (0, 0, 1), illustrated in Figure 2.9, form a basis.
2. The vectors (1, 1, 1, 1), (1, 1, 1, 1), (1, 1, 0, 0), (0, 0, 1, 1) form a basis of R4 known
as the Haar basis. This basis and its generalization to dimension 2n are crucial in
wavelet theory.
3. In the subspace of polynomials in R[X] of degree at most n, the polynomials 1, X, X 2 ,

. . . , X n form a basis.
✓ ◆
n
4. The Bernstein polynomials (1 X)n k X k for k = 0, . . . , n, also form a basis of
k
that space. These polynomials play a major role in the theory of spline curves.
The first key result of linear algebra is that every vector space E has a basis. We begin
with a crucial lemma which formalizes the mechanism for building a basis incrementally.
Lemma 2.9. Given a linearly independent family (ui )i2I of elements of a vector space E, if
v 2 E is not a linear combination of (ui )i2I , then the family (ui )i2I [k (v) obtained by adding
v to the family (ui )i2I is linearly independent (where k 2 / I).
P
Proof. Assume that µv + i2I i ui = 0, for any family ( i )i2I ofPscalars in R. If µ 6= 0, then
µ has an inverse (because R is a field), and thus we have v = i2I (µ
1
i )ui , showing that
v is a linear
P combination of (u )
i i2I and contradicting the hypothesis. Thus, µ = 0. But then,
we have i2I i ui = 0, and since the family (ui )i2I is linearly independent, we have i = 0
for all i 2 I.
The next theorem holds in general, but the proof is more sophisticated for vector spaces
that do not have a finite set of generators. Thus, in this chapter, we only prove the theorem
for finitely generated vector spaces.
Theorem 2.10. Given any finite family S = (ui )i2I generating a vector space E and any
linearly independent subfamily L = (uj )j2J of S (where J ✓ I), there is a basis B of E such
that L ✓ B ✓ S.
Proof. Consider the set of linearly independent families B such that L ✓ B ✓ S. Since this
set is nonempty and finite, it has some maximal element (that is, a subfamily B = (uh )h2H
of S with H ✓ I of maximum cardinality), say B = (uh )h2H . We claim that B generates
E. Indeed, if B does not generate E, then there is some up 2 S that is not a linear
combination of vectors in B (since S generates E), with p 2 / H. Then by Lemma 2.9, the
family B 0 = (uh )h2H[{p} is linearly independent, and since L ✓ B ⇢ B 0 ✓ S, this contradicts
the maximality of B. Thus, B is a basis of E such that L ✓ B ✓ S.
Remark: Theorem 2.10 also holds for vector spaces that are not finitely generated. In this
case, the problem is to guarantee the existence of a maximal linearly independent family B
such that L ✓ B ✓ S. The existence of such a maximal family can be shown using Zorn’s
lemma; see Lang [40] (Theorem 5.1).
A situation where the full generality of Theorem 2.10 is needed
p is the case of the vector
space R over the field of coefficients Q. The numbers 1 and 2 are linearly independent
p
over Q, so according to Theorem 2.10, the linearly independent family L = (1, 2) can be
extended to a basis B of R. Since R is uncountable and Q is countable, such a basis must
be uncountable!
The notion of a basis can also be defined in terms of the notion of maximal linearly
independent family and minimal generating family.
Definition 2.10. Let (vi )i2I be a family of vectors in a vector space E. We say that (vi )i2I
a maximal linearly independent family of E if it is linearly independent, and if for any vector
w 2 E, the family (vi )i2I [k {w} obtained by adding w to the family (vi )i2I is linearly
dependent. We say that (vi )i2I a minimal generating family of E if it spans E, and if for
any index p 2 I, the family (vi )i2I {p} obtained by removing vp from the family (vi )i2I does
not span E.
The following proposition giving useful properties characterizing a basis is an immediate

consequence of Lemma 2.9.
Proposition 2.11. Given a vector space E, for any family B = (vi )i2I of vectors of E, the
following properties are equivalent:
(1) B is a basis of E.
(2) B is a maximal linearly independent family of E.
(3) B is a minimal generating family of E.
Proof. We will first prove the equivalence of (1) and (2). Assume (1). Since B is a basis, it is
a linearly independent family. We claim that B is a maximal linearly independent family. If
B is not a maximal linearly independent family, then there is some vector w 2 E such that
the family B 0 obtained by adding w to B is linearly independent. However, since B is a basis
of E, the vector w can be expressed as a linear combination of vectors in B, contradicting

the fact that B 0 is linearly independent.
Conversely, assume (2). We claim that B spans E. If B does not span E, then there is
some vector w 2 E which is not a linear combination of vectors in B. By Lemma 2.9, the
family B 0 obtained by adding w to B is linearly independent. Since B is a proper subfamily
of B 0 , this contradicts the assumption that B is a maximal linearly independent family.
Therefore, B must span E, and since B is also linearly independent, it is a basis of E.
Now we will prove the equivalence of (1) and (3). Again, assume (1). Since B is a basis,
it is a generating family of E. We claim that B is a minimal generating family. If B is not
a minimal generating family, then there is a proper subfamily B 0 of B that spans E. Then,
every w 2 B B 0 can be expressed as a linear combination of vectors from B 0 , contradicting
the fact that B is linearly independent.
Conversely, assume (3). We claim that B is linearly independent. If B is not linearly
independent, then some vector w 2 B can be expressed as a linear combination of vectors
in B 0 = B {w}. Since B generates E, the family B 0 also generates E, but B 0 is a
proper subfamily of B, contradicting the minimality of B. Since B spans E and is linearly
independent, it is a basis of E.
The second key result of linear algebra is that for any two bases (ui )i2I and (vj )j2J of a
vector space E, the index sets I and J have the same cardinality. In particular, if E has a
finite basis of n elements, every basis of E has n elements, and the integer n is called the
dimension of the vector space E.
To prove the second key result, we can use the following replacement lemma due to
Steinitz. This result shows the relationship between finite linearly independent families and
finite families of generators of a vector space. We begin with a version of the lemma which is
a bit informal, but easier to understand than the precise and more formal formulation given
in Proposition 2.13. The technical difficulty has to do with the fact that some of the indices
need to be renamed.
Proposition 2.12. (Replacement lemma, version 1) Given a vector space E, let (u1 , . . . , um )
be any finite linearly independent family in E, and let (v1 , . . . , vn ) be any finite family such
that every ui is a linear combination of (v1 , . . . , vn ). Then we must have m  n, and there
is a replacement of m of the vectors vj by (u1 , . . . , um ), such that after renaming some of the
indices of the vj s, the families (u1 , . . . , um , vm+1 , . . . , vn ) and (v1 , . . . , vn ) generate the same
subspace of E.
Proof. We proceed by induction on m. When m = 0, the family (u1 , . . . , um ) is empty, and

the proposition holds trivially. For the induction step, we have a linearly independent family
(u1 , . . . , um , um+1 ). Consider the linearly independent family (u1 , . . . , um ). By the induction
hypothesis, m  n, and there is a replacement of m of the vectors vj by (u1 , . . . , um ), such
that after renaming some of the indices of the vs, the families (u1 , . . . , um , vm+1 , . . . , vn ) and
(v1 , . . . , vn ) generate the same subspace of E. The vector um+1 can also be expressed as a lin-
ear combination of (v1 , . . . , vn ), and since (u1 , . . . , um , vm+1 , . . . , vn ) and (v1 , . . . , vn ) generate
the same subspace, um+1 can be expressed as a linear combination of (u1 , . . . , um , vm+1 , . . .,
vn ), say
m
X Xn
um+1 = i ui + j vj .
i=1 j=m+1
We claim that j 6= 0 for some j with m + 1  j  n, which implies that m + 1  n.

Otherwise, we would have
m
X
um+1 = i ui ,
i=1
a nontrivial linear dependence of the ui , which is impossible since (u1 , . . . , um+1 ) are linearly
independent.
Therefore, m + 1  n, and after renaming indices if necessary, we may assume that
m+1 6= 0, so we get
m
X n
X
1 1 1
vm+1 = ( m+1 i )ui m+1 um+1 ( m+1 j )vj .
i=1 j=m+2
Observe that the families (u1 , . . . , um , vm+1 , . . . , vn ) and (u1 , . . . , um+1 , vm+2 , . . . , vn ) generate
the same subspace, since um+1 is a linear combination of (u1 , . . . , um , vm+1 , . . . , vn ) and vm+1
is a linear combination of (u1 , . . . , um+1 , vm+2 , . . . , vn ). Since (u1 , . . . , um , vm+1 , . . . , vn ) and
(v1 , . . . , vn ) generate the same subspace, we conclude that (u1 , . . . , um+1 , vm+2 , . . . , vn ) and
and (v1 , . . . , vn ) generate the same subspace, which concludes the induction hypothesis.
Here is an example illustrating the replacement lemma. Consider sequences (u1 , u2 , u3 )

and (v1 , v2 , v3 , v4 , v5 ), where (u1 , u2 , u3 ) is a linearly independent family and with the ui s
expressed in terms of the vj s as follows:
u1 = v 4 + v5
u2 = v 3 + v4 v 5
u3 = v 1 + v2 + v3 .
From the first equation we get

v 4 = u1 v5 ,
and by substituting in the second equation we have
u2 = v 3 + v4 v 5 = v 3 + u1 v5 v 5 = u1 + v 3 2v5 .
From the above equation we get
v3 = u1 + u2 + 2v5 ,
and so
u3 = v 1 + v 2 + v 3 = v 1 + v 2 u1 + u2 + 2v5 .
Finally, we get
v 1 = u1 u2 + u 3 v2 2v5
Therefore we have
v 1 = u1 u2 + u 3 v 2 2v5
v3 = u1 + u2 + 2v5
v 4 = u1 v 5 ,
which shows that (u1 , u2 , u3 , v2 , v5 ) spans the same subspace as (v1 , v2 , v3 , v4 , v5 ). The vectors
(v1 , v3 , v4 ) have been replaced by (u1 , u2 , u3 ), and the vectors left over are (v2 , v5 ). We can
rename them (v4 , v5 ).
For the sake of completeness, here is a more formal statement of the replacement lemma
(and its proof).
Proposition 2.13. (Replacement lemma, version 2) Given a vector space E, let (ui )i2I be
any finite linearly independent family in E, where |I| = m, and let (vj )j2J be any finite family
such that every ui is a linear combination of (vj )j2J , where |J| = n. Then there exists a set
L and an injection ⇢ : L ! J (a relabeling function) such that L \ I = ;, |L| = n m, and
the families (ui )i2I [ (v⇢(l) )l2L and (vj )j2J generate the same subspace of E. In particular,
m  n.
Proof. We proceed by induction on |I| = m. When m = 0, the family (ui )i2I is empty, and
the proposition holds trivially with L = J (⇢ is the identity). Assume |I| = m + 1. Consider
the linearly independent family (ui )i2(I {p}) , where p is any member of I. By the induction
hypothesis, there exists a set L and an injection ⇢ : L ! J such that L \ (I {p}) = ;,
|L| = n m, and the families (ui )i2(I {p}) [ (v⇢(l) )l2L and (vj )j2J generate the same subspace
of E. If p 2 L, we can replace L by (L {p}) [ {p0 } where p0 does not belong to I [ L, and
replace ⇢ by the injection ⇢0 which agrees with ⇢ on L {p} and such that ⇢0 (p0 ) = ⇢(p).
Thus, we can always assume that L \ I = ;. Since up is a linear combination of (vj )j2J
and the families (ui )i2(I {p}) [ (v⇢(l) )l2L and (vj )j2J generate the same subspace of E, up is
a linear combination of (ui )i2(I {p}) [ (v⇢(l) )l2L . Let
X X
up = i ui + l v⇢(l) . (1)
i2(I {p}) l2L
If l = 0 for all l 2 L, we have

X
i ui up = 0,
i2(I {p})
contradicting the fact that (ui )i2I is linearly independent. Thus, l 6= 0 for some l 2 L, say
l = q. Since q 6= 0, we have
X X
v⇢(q) = ( q 1 i )ui + q 1 up + ( q
1
l )v⇢(l) . (2)
i2(I {p}) l2(L {q})
We claim that the families (ui )i2(I {p}) [ (v⇢(l) )l2L and (ui )i2I [ (v⇢(l) )l2(L {q}) generate the
same subset of E. Indeed, the second family is obtained from the first by replacing v⇢(q) by up ,
and vice-versa, and up is a linear combination of (ui )i2(I {p}) [ (v⇢(l) )l2L , by (1), and v⇢(q) is a
linear combination of (ui )i2I [(v⇢(l) )l2(L {q}) , by (2). Thus, the families (ui )i2I [(v⇢(l) )l2(L {q})
and (vj )j2J generate the same subspace of E, and the proposition holds for L {q} and the
restriction of the injection ⇢ : L ! J to L {q}, since L \ I = ; and |L| = n m imply that
(L {q}) \ I = ; and |L {q}| = n (m + 1).
The idea is that m of the vectors vj can be replaced by the linearly independent ui s in
such a way that the same subspace is still generated. The purpose of the function ⇢ : L ! J
is to pick n m elements j1 , . . . , jn m of J and to relabel them l1 , . . . , ln m in such a way
that these new indices do not clash with the indices in I; this way, the vectors vj1 , . . . , vjn m
who “survive” (i.e. are not replaced) are relabeled vl1 , . . . , vln m , and the other m vectors vj
with j 2 J {j1 , . . . , jn m } are replaced by the ui . The index set of this new family is I [ L.
Actually, one can prove that Proposition 2.13 implies Theorem 2.10 when the vector
space is finitely generated. Putting Theorem 2.10 and Proposition 2.13 together, we obtain
the following fundamental theorem.
Theorem 2.14. Let E be a finitely generated vector space. Any family (ui )i2I generating E
contains a subfamily (uj )j2J which is a basis of E. Any linearly independent family (ui )i2I
can be extended to a family (uj )j2J which is a basis of E (with I ✓ J). Furthermore, for
every two bases (ui )i2I and (vj )j2J of E, we have |I| = |J| = n for some fixed integer n 0.
Proof. The first part follows immediately by applying Theorem 2.10 with L = ; and S =
(ui )i2I . For the second part, consider the family S 0 = (ui )i2I [ (vh )h2H , where (vh )h2H is any
finitely generated family generating E, and with I \ H = ;. Then apply Theorem 2.10 to
L = (ui )i2I and to S 0 . For the last statement, assume that (ui )i2I and (vj )j2J are bases of
E. Since (ui )i2I is linearly independent and (vj )j2J spans E, Proposition 2.13 implies that
|I|  |J|. A symmetric argument yields |J|  |I|.
Remark: Theorem 2.14 also holds for vector spaces that are not finitely generated.
Definition 2.11. When a vector space E is not finitely generated, we say that E is of infinite
dimension. The dimension of a finitely generated vector space E is the common dimension
n of all of its bases and is denoted by dim(E).
Clearly, if the field R itself is viewed as a vector space, then every family (a) where a 2 R
and a 6= 0 is a basis. Thus dim(R) = 1. Note that dim({0}) = 0.
Definition 2.12. If E is a vector space of dimension n 1, for any subspace U of E, if

dim(U ) = 1, then U is called a line; if dim(U ) = 2, then U is called a plane; if dim(U ) = n 1,
then U is called a hyperplane. If dim(U ) = k, then U is sometimes called a k-plane.
Let (ui )i2I be a basis of a vector space E. For any vector v 2 E, since the family (ui )i2I
generates E, there is a family ( i )i2I of scalars in R, such that
X
v= i ui .
i2I
A very important fact is that the family ( i )i2I is unique.
Proposition 2.15. Given P v 2 E,

P a vector space E, let (ui )i2I be a family of vectors in E. Let
and assume that v = i2I i ui . Then the family ( i )i2I of scalars such that v = i2I i ui
is unique i↵ (ui )i2I is linearly independent.
Proof. First, assumePthat (ui )i2I is linearly independent. If (µi )i2I is another family of scalars
in R such that v = i2I µi ui , then we have
X
( i µi )ui = 0,
i2I
and since (ui )i2I is linearly independent, we must have i µi = 0 for all i 2 I, that is, i = µi
for all i 2 I. The converse is shown by contradiction. If (ui )i2I was linearly dependent, there
would be a family (µi )i2I of scalars not all null such that
X
µi u i = 0
i2I
and µj 6= 0 for some j 2 I. But then,

X X X X
v= i ui +0= i ui + µi u i = ( i + µi )ui ,
i2I i2I i2I i2I
with j 6= j +µ
Pj since µj 6= 0, contradicting the assumption that ( i )i2I is the unique family
such that v = i2I i ui .
Definition 2.13. If (ui )i2I is a basis of a vector space E, for any vector v 2 E, if (xi )i2I is
the unique family of scalars in R such that
X
v= x i ui ,
i2I
each xi is called the component (or coordinate) of index i of v with respect to the basis (ui )i2I .
2.6. MATRICES 57
2.6 Matrices
In Section 2.1 we introduced informally the notion of a matrix. In this section we define
matrices precisely, and also introduce some operations on matrices. It turns out that matri-
ces form a vector space equipped with a multiplication operation which is associative, but
noncommutative. We will explain in Section 3.1 how matrices can be used to represent linear
maps, defined in the next section.
Definition 2.14. If K = R or K = C, an m ⇥ n-matrix over K is a family (ai j )1im, 1jn

of scalars in K, represented by an array
0 1
a1 1 a1 2 . . . a1 n
B a2 1 a2 2 . . . a2 n C
B C
B .. .. .. .. C .
@ . . . . A
am 1 am 2 . . . am n
In the special case where m = 1, we have a row vector , represented by
(a1 1 · · · a1 n )
and in the special case where n = 1, we have a column vector , represented by

0 1
a1 1
B .. C
@ . A.
am 1
In these last two cases, we usually omit the constant index 1 (first index in case of a row,
second index in case of a column). The set of all m ⇥ n-matrices is denoted by Mm,n (K)
or Mm,n . An n ⇥ n-matrix is called a square matrix of dimension n. The set of all square
matrices of dimension n is denoted by Mn (K), or Mn .
Remark: As defined, a matrix A = (ai j )1im, 1jn is a family, that is, a function from
{1, 2, . . . , m} ⇥ {1, 2, . . . , n} to K. As such, there is no reason to assume an ordering on
the indices. Thus, the matrix A can be represented in many di↵erent ways as an array, by
adopting di↵erent orders for the rows or the columns. However, it is customary (and usually
convenient) to assume the natural ordering on the sets {1, 2, . . . , m} and {1, 2, . . . , n}, and
to represent A as an array according to this ordering of the rows and columns.
We define some operations on matrices as follows.
Definition 2.15. Given two m ⇥ n matrices A = (ai j ) and B = (bi j ), we define their sum
A + B as the matrix C = (ci j ) such that ci j = ai j + bi j ; that is,
0 1 0 1
a1 1 a1 2 . . . a1 n b1 1 b1 2 . . . b1 n
B a2 1 a2 2 C B
. . . a2 n C B b2 1 b2 2 . . . b2 n C
B C
B .. .. .. .. C + B .. .. .. .. C
@ . . . . A @ . . . . A
am 1 am 2 . . . am n bm 1 bm 2 . . . bm n
0 1
a1 1 + b1 1 a1 2 + b1 2 ... a1 n + b 1 n
B a2 1 + b 2 1 a2 2 + b 2 2 ... a2 n + b 2 n C
B C
=B .. .. ... .. C.
@ . . . A
am 1 + b m 1 am 2 + b m 2 . . . am n + bm n
For any matrix A = (ai j ), we let A be the matrix ( ai j ). Given a scalar 2 K, we define
the matrix A as the matrix C = (ci j ) such that ci j = ai j ; that is
0 1 0 1
a1 1 a1 2 . . . a1 n a1 1 a1 2 ... a1 n
B a2 1 a2 2 . . . a2 n C B a2 1 a2 2 ... a2 n C
B C B C
B .. .. .. .. C = B .. .. .. .. C .
@ . . . . A @ . . . . A
am 1 am 2 . . . am n am 1 am 2 . . . am n
Given an m ⇥ n matrices A = (ai k ) and an n ⇥ p matrices B = (bk j ), we define their product

AB as the m ⇥ p matrix C = (ci j ) such that
n
X
ci j = ai k b k j ,
k=1
for 1  i  m, and 1  j  p. In the product AB = C shown below

0 10 1 0 1
a1 1 a1 2 . . . a1 n b1 1 b1 2 . . . b 1 p c1 1 c1 2 . . . c1 p
B a2 1 a2 2 . . . a2 n C B b2 1 b2 2 . . . b 2 p C B c 2 1 c 2 2 . . . c2 p C
B CB C B C
B .. .. ... .. C B .. .. . . .. C = B .. .. .. .. C ,
@ . . . A@ . . . . A @ . . . . A
am 1 am 2 . . . am n bn 1 bn 2 . . . bn p cm 1 cm 2 . . . cm p
note that the entry of index i and j of the matrix AB obtained by multiplying the matrices
A and B can be identified with the product of the row matrix corresponding to the i-th row
of A with the column matrix corresponding to the j-column of B:
0 1
b1 j n
B .. C X
(ai 1 · · · ai n ) @ . A = ai k b k j .
bn j k=1
2.6. MATRICES 59
Definition 2.16. The square matrix In of dimension n containing 1 on the diagonal and 0
everywhere else is called the identity matrix . It is denoted by
0 1
1 0 ... 0
B0 1 . . . 0 C
B C
In = B .. .. . . .. C
@. . . .A
0 0 ... 1
Definition 2.17. Given an m ⇥ n matrix A = (ai j ), its transpose A> = (a> j i ), is the
n ⇥ m-matrix such that a>
ji = a ij , for all i, 1  i  m, and all j, 1  j  n.
The transpose of a matrix A is sometimes denoted by At , or even by t A. Note that the

transpose A> of a matrix A has the property that the j-th row of A> is the j-th column of
A. In other words, transposition exchanges the rows and the columns of a matrix. Here is
an example. If A is the 5 ⇥ 6 matrix
0 1
1 2 3 4 5 6
B 7 1 2 3 4 5C
B C
A=B B 8 7 1 2 3 4 C,
C
@ 9 8 7 1 2 3A
10 9 8 7 1 2
then A> is the 6 ⇥ 5 matrix 0 1

1 7 8 9 10
B2 1 7 8 9C
B C
B3 2 1 7 8C
A> = B
B4
C.
B 3 2 1 7C C
@5 4 3 2 1A
6 5 4 3 2
The following observation will be useful later on when we discuss the SVD. Given any
m ⇥ n matrix A and any n ⇥ p matrix B, if we denote the columns of A by A1 , . . . , An and
the rows of B by B1 , . . . , Bn , then we have
AB = A1 B1 + · · · + An Bn .
For every square matrix A of dimension n, it is immediately verified that AIn = In A = A.
Definition 2.18. For any square matrix A of dimension n, if a matrix B such that AB =
BA = In exists, then it is unique, and it is called the inverse of A. The matrix B is also
denoted by A 1 . An invertible matrix is also called a nonsingular matrix, and a matrix that
is not invertible is called a singular matrix.
Using Proposition 2.20 and the fact that matrices represent linear maps, it can be shown
that if a square matrix A has a left inverse, that is a matrix B such that BA = I, or a right
inverse, that is a matrix C such that AC = I, then A is actually invertible; so B = A 1 and
C = A 1 . These facts also follow from Proposition 5.14.
Using Proposition 2.3 (or mimicking the computations in its proof), we note that if A
and B are two n ⇥ n invertible matrices, then AB is also invertible and (AB) 1 = B 1 A 1 .
It is immediately verified that the set Mm,n (K) of m ⇥ n matrices is a vector space under
addition of matrices and multiplication of a matrix by a scalar.
Definition 2.19. The m ⇥ n-matrices Eij = (eh k ), are defined such that ei j = 1, and
eh k = 0, if h 6= i or k 6= j; in other words, the (i, j)-entry is equal to 1 and all other entries
are 0.
Here are the Eij matrices for m = 2 and n = 3:

✓ ◆ ✓ ◆ ✓ ◆
1 0 0 0 1 0 0 0 1
E11 = , E12 = , E13 =
0 0 0 0 0 0 0 0 0
✓ ◆ ✓ ◆ ✓ ◆
0 0 0 0 0 0 0 0 0
E21 = , E22 = , E23 = .
1 0 0 0 1 0 0 0 1
It is clear that every matrix A = (ai j ) 2 Mm,n (K) can be written in a unique way as
m X
X n
A= ai j Eij .
i=1 j=1
Thus, the family (Eij )1im,1jn is a basis of the vector space Mm,n (K), which has dimension
mn.
Remark: Definition 2.14 and Definition 2.15 also make perfect sense when K is a (com-
mutative) ring rather than a field. In this more general setting, the framework of vector
spaces is too narrow, but we can consider structures over a commutative ring A satisfying
all the axioms of Definition 2.4. Such structures are called modules. The theory of modules
is (much) more complicated than that of vector spaces. For example, modules do not always
have a basis, and other properties holding for vector spaces usually fail for modules. When
a module has a basis, it is called a free module. For example, when A is a commutative
ring, the structure An is a module such that the vectors ei , with (ei )i = 1 and (ei )j = 0 for
j 6= i, form a basis of An . Many properties of vector spaces still hold for An . Thus, An is a
free module. As another example, when A is a commutative ring, Mm,n (A) is a free module
with basis (Ei,j )1im,1jn . Polynomials over a commutative ring also form a free module
of infinite dimension.
The properties listed in Proposition 2.16 are easily verified, although some of the com-
putations are a bit tedious. A more conceptual proof is given in Proposition 3.1.
2.7. LINEAR MAPS 61
Proposition 2.16. (1) Given any matrices A 2 Mm,n (K), B 2 Mn,p (K), and C 2 Mp,q (K),
we have
(AB)C = A(BC);
that is, matrix multiplication is associative.
(2) Given any matrices A, B 2 Mm,n (K), and C, D 2 Mn,p (K), for all 2 K, we have
(A + B)C = AC + BC
A(C + D) = AC + AD
( A)C = (AC)
A( C) = (AC),
so that matrix multiplication · : Mm,n (K) ⇥ Mn,p (K) ! Mm,p (K) is bilinear.
The properties of Proposition 2.16 together with the fact that AIn = In A = A for all
square n⇥n matrices show that Mn (K) is a ring with unit In (in fact, an associative algebra).
This is a noncommutative ring with zero divisors, as shown by the following example.
Example 2.7. For example, letting A, B be the 2 ⇥ 2-matrices

✓ ◆ ✓ ◆
1 0 0 0
A= , B= ,
0 0 1 0
then ✓ ◆✓ ◆ ✓ ◆
1 0 0 0 0 0
AB = = ,
0 0 1 0 0 0
and ✓ ◆✓ ◆ ✓ ◆
0 0 1 0 0 0
BA = = .
1 0 0 0 1 0
Thus AB 6= BA, and AB = 0, even though both A, B 6= 0.
2.7 Linear Maps

Now that we understand vector spaces and how to generate them, we would like to be able
to transform one vector space E into another vector space F . A function between two vector
spaces that preserves the vector space structure is called a homomorphism of vector spaces,
or linear map. Linear maps formalize the concept of linearity of a function.
Keep in mind that linear maps, which are transformations of

space, are usually far more important than the spaces
themselves.
In the rest of this section, we assume that all vector spaces are real vector spaces, but all
results hold for vector spaces over an arbitrary field.
Definition 2.20. Given two vector spaces E and F , a linear map (or linear transformation)
between E and F is a function f : E ! F satisfying the following two conditions:
f (x + y) = f (x) + f (y) for all x, y 2 E;

f ( x) = f (x) for all 2 R, x 2 E.
Setting x = y = 0 in the first identity, we get f (0) = 0. The basic property of linear maps
is that they transform linear combinations into linear combinations. Given any finite family
(ui )i2I of vectors in E, given any family ( i )i2I of scalars in R, we have
X X
f( i ui ) = i f (ui ).
i2I i2I
The above identity is shown by induction on |I| using the properties of Definition 2.20.
Example 2.8.
1. The map f : R2 ! R2 defined such that
x0 = x y
y0 = x + y
is a linear map. The reader should

p check that it is the composition of a rotation by
⇡/4 with a magnification of ratio 2.
2. For any vector space E, the identity map id : E ! E given by
id(u) = u for all u 2 E
is a linear map. When we want to be more precise, we write idE instead of id.
3. The map D : R[X] ! R[X] defined such that
D(f (X)) = f 0 (X),
where f 0 (X) is the derivative of the polynomial f (X), is a linear map.
4. The map : C([a, b]) ! R given by

Z b
(f ) = f (t)dt,
a
where C([a, b]) is the set of continuous functions defined on the interval [a, b], is a linear
map.
2.7. LINEAR MAPS 63
5. The function h , i : C([a, b]) ⇥ C([a, b]) ! R given by

Z b
hf, gi = f (t)g(t)dt,
a
is linear in each of the variable f , g. It also satisfies the properties hf, gi = hg, f i and
hf, f i = 0 i↵ f = 0. It is an example of an inner product.
Definition 2.21. Given a linear map f : E ! F , we define its image (or range) Im f = f (E),
as the set
Im f = {y 2 F | (9x 2 E)(y = f (x))},
1
and its Kernel (or nullspace) Ker f = f (0), as the set
Ker f = {x 2 E | f (x) = 0}.
The derivative map D : R[X] ! R[X] from Example 2.8(3) has kernel the constant
polynomials, so Ker D = R. If we consider the second derivative D D : R[X] ! R[X], then
the kernel of D D consists of all polynomials of degree  1. The image of D : R[X] ! R[X]
is actually R[X] itself, because every polynomial P (X) = a0 X n + · · · + an 1 X + an of degree
n is the derivative of the polynomial Q(X) of degree n + 1 given by
X n+1 X2
Q(X) = a0 + · · · + an 1 + an X.
n+1 2
On the other hand, if we consider the restriction of D to the vector space R[X]n of polyno-
mials of degree  n, then the kernel of D is still R, but the image of D is the R[X]n 1 , the
vector space of polynomials of degree  n 1.
Proposition 2.17. Given a linear map f : E ! F , the set Im f is a subspace of F and the
set Ker f is a subspace of E. The linear map f : E ! F is injective i↵ Ker f = (0) (where
(0) is the trivial subspace {0}).
Proof. Given any x, y 2 Im f , there are some u, v 2 E such that x = f (u) and y = f (v),
and for all , µ 2 R, we have
f ( u + µv) = f (u) + µf (v) = x + µy,
and thus, x + µy 2 Im f , showing that Im f is a subspace of F .
Given any x, y 2 Ker f , we have f (x) = 0 and f (y) = 0, and thus,
f ( x + µy) = f (x) + µf (y) = 0,
that is, x + µy 2 Ker f , showing that Ker f is a subspace of E.
First, assume that Ker f = (0). We need to prove that f (x) = f (y) implies that x = y.
However, if f (x) = f (y), then f (x) f (y) = 0, and by linearity of f we get f (x y) = 0.
Because Ker f = (0), we must have x y = 0, that is x = y, so f is injective. Conversely,
assume that f is injective. If x 2 Ker f , that is f (x) = 0, since f (0) = 0 we have f (x) = f (0),
and by injectivity, x = 0, which proves that Ker f = (0). Therefore, f is injective i↵
Ker f = (0).
Since by Proposition 2.17, the image Im f of a linear map f is a subspace of F , we can

define the rank rk(f ) of f as the dimension of Im f .
Definition 2.22. Given a linear map f : E ! F , the rank rk(f ) of f is the dimension of
the image Im f of f .
A fundamental property of bases in a vector space is that they allow the definition of
linear maps as unique homomorphic extensions, as shown in the following proposition.
Proposition 2.18. Given any two vector spaces E and F , given any basis (ui )i2I of E,
given any other family of vectors (vi )i2I in F , there is a unique linear map f : E ! F such
that f (ui ) = vi for all i 2 I. Furthermore, f is injective i↵ (vi )i2I is linearly independent,
and f is surjective i↵ (vi )i2I generates F .
Proof. If such a linear map f : E ! F exists, since (ui )i2I is a basis of E, every vector x 2 E
can written uniquely as a linear combination
X
x= x i ui ,
i2I
and by linearity, we must have

X X
f (x) = xi f (ui ) = xi vi .
i2I i2I
Define the function f : E ! F , by letting

X
f (x) = xi vi
i2I
P
for every x = i2I xi ui . It is easy to verify that f is indeed linear, it is unique by the
previous reasoning, and obviously, f (ui ) = vi .
Now assume that f is injective. Let ( i )i2I be any family of scalars, and assume that
X
i vi = 0.
i2I
Since vi = f (ui ) for every i 2 I, we have

X X X
f( i ui ) = i f (ui ) = i vi = 0.
i2I i2I i2I
Since f is injective i↵ Ker f = (0), we have

X
i ui = 0,
i2I
2.7. LINEAR MAPS 65
and since (ui )i2I is a basis, we have i = 0 for all i 2 I, which shows that (vi )i2I is linearly
independent. Conversely, assume that (vi )i2I is linearly Pindependent. Since (ui )i2I is a basis
of E, every vector x 2 E is a linear combination x = i2I i ui of (ui )i2I . If
X
f (x) = f ( i ui ) = 0,
i2I
then X X X
i vi = i f (ui ) = f( i ui ) = 0,
i2I i2I i2I
and i = 0 for all i 2 I because (vi )i2I is linearly independent, which means that x = 0.
Therefore, Ker f = (0), which implies that f is injective. The part where f is surjective is
left as a simple exercise.
Figure 2.11 provides an illustration of Proposition 2.18 when E = R3 and V = R2
f
2
3 F= R
E= R
u3= (0,0,1)
v1= (1,1)
v2= (-1,1)
u2= (0,1,0) v = (1,0)

3
u1 = (1,0,0)
defining f
f(u1) - f(u )
2
2f(u3 )
f is not injective
Figure 2.11: Given u1 = (1, 0, 0), u2 = (0, 1, 0), u3 = (0, 0, 1) and v1 = (1, 1), v2 = ( 1, 1),
v3 = (1, 0), define the unique linear map f : R3 ! R2 by f (u1 ) = v1 , f (u2 ) = v2 , and
f (u3 ) = v3 . This map is surjective but not injective since f (u1 u2 ) = f (u1 ) f (u2 ) =
(1, 1) ( 1, 1) = (2, 0) = 2f (u3 ) = f (2u3 ).
By the second part of Proposition 2.18, an injective linear map f : E ! F sends a basis
(ui )i2I to a linearly independent family (f (ui ))i2I of F , which is also a basis when f is
bijective. Also, when E and F have the same finite dimension n, (ui )i2I is a basis of E, and
f : E ! F is injective, then (f (ui ))i2I is a basis of F (by Proposition 2.11).
The following simple proposition is also useful.
Proposition 2.19. Given any two vector spaces E and F , with F nontrivial, given any
family (ui )i2I of vectors in E, the following properties hold:
(1) The family (ui )i2I generates E i↵ for every family of vectors (vi )i2I in F , there is at
most one linear map f : E ! F such that f (ui ) = vi for all i 2 I.
(2) The family (ui )i2I is linearly independent i↵ for every family of vectors (vi )i2I in F ,
there is some linear map f : E ! F such that f (ui ) = vi for all i 2 I.
Proof. (1) If there is any linear map f : E ! F such that f (ui ) = vi for all i 2 I, since
(ui )i2I generates E, every vector x 2 E can be written as some linear combination
X
x= x i ui ,
i2I
and by linearity, we must have

X X
f (x) = xi f (ui ) = xi vi .
i2I i2I
This shows that f is unique if it exists. Conversely, assume that (ui )i2I does not generate E.
Since F is nontrivial, there is some some vector y 2 F such that y 6= 0. Since (ui )i2I does
not generate E, there is some vector w 2 E that is not in the subspace generated by (ui )i2I .
By Theorem 2.14, there is a linearly independent subfamily (ui )i2I0 of (ui )i2I generating the
same subspace. Since by hypothesis, w 2 E is not in the subspace generated by (ui )i2I0 , by
Lemma 2.9 and by Theorem 2.14 again, there is a basis (ej )j2I0 [J of E, such that ei = ui
for all i 2 I0 , and w = ej0 for some j0 2 J. Letting (vi )i2I be the family in F such that
vi = 0 for all i 2 I, defining f : E ! F to be the constant linear map with value 0, we have
a linear map such that f (ui ) = 0 for all i 2 I. By Proposition 2.18, there is a unique linear
map g : E ! F such that g(w) = y, and g(ej ) = 0 for all j 2 (I0 [ J) {j0 }. By definition
of the basis (ej )j2I0 [J of E, we have g(ui ) = 0 for all i 2 I, and since f 6= g, this contradicts
the fact that there is at most one such map. See Figure 2.12.
(2) If the family (ui )i2I is linearly independent, then by Theorem 2.14, (ui )i2I can be
extended to a basis of E, and the conclusion follows by Proposition 2.18. Conversely, assume
that (ui )i2I is linearly dependent. Then there is some family ( i )i2I of scalars (not all zero)
such that X
i ui = 0.
i2I
By the assumption, for any nonzero vector y 2 F , for every i 2 I, there is some linear map
fi : E ! F , such that fi (ui ) = y, and fi (uj ) = 0, for j 2 I {i}. Then we would get
X X
0 = fi ( i ui ) = i fi (ui ) = i y,
i2I i2I
and since y 6= 0, this implies i = 0 for every i 2 I. Thus, (ui )i2I is linearly independent.
Given vector spaces E, F , and G, and linear maps f : E ! F and g : F ! G, it is easily

verified that the composition g f : E ! G of f and g is a linear map.
2.7. LINEAR MAPS 67
f
2
3 F= R
w = (0,0,1) E= R
u2= (0,1,0)
u1 = (1,0,0)
defining f as the zero
g(w) = y
2
3 F= R
w = (0,0,1) E= R
u2= (0,1,0)
y = (1,0)
u1 = (1,0,0) defining g
Figure 2.12: Let E = R3 and F = R2 . The vectors u1 = (1, 0, 0), u2 = (0, 1, 0) do not
generate R3 since both the zero map and the map g, where g(0, 0, 1) = (1, 0), send the peach
xy-plane to the origin.
Definition 2.23. A linear map f : E ! F is an isomorphism i↵ there is a linear map

g : F ! E, such that
g f = idE and f g = idF . (⇤)
The map g in Definition 2.23 is unique. This is because if g and h both satisfy g f = idE ,
f g = idF , h f = idE , and f h = idF , then
g = g idF = g (f h) = (g f ) h = idE h = h.
1
The map g satisfying (⇤) above is called the inverse of f and it is also denoted by f .
Observe that Proposition 2.18 shows that if F = Rn , then we get an isomorphism between
any vector space E of dimension |J| = n and Rn . Proposition 2.18 also implies that if E
and F are two vector spaces, (ui )i2I is a basis of E, and f : E ! F is a linear map which is
an isomorphism, then the family (f (ui ))i2I is a basis of F .
1
One can verify that if f : E ! F is a bijective linear map, then its inverse f : F ! E,
as a function, is also a linear map, and thus f is an isomorphism.
Another useful corollary of Proposition 2.18 is this:
Proposition 2.20. Let E be a vector space of finite dimension n 1 and let f : E ! E be
any linear map. The following properties hold:
(1) If f has a left inverse g, that is, if g is a linear map such that g f = id, then f is an
isomorphism and f 1 = g.
(2) If f has a right inverse h, that is, if h is a linear map such that f h = id, then f is
an isomorphism and f 1 = h.
Proof. (1) The equation g f = id implies that f is injective; this is a standard result
about functions (if f (x) = f (y), then g(f (x)) = g(f (y)), which implies that x = y since
g f = id). Let (u1 , . . . , un ) be any basis of E. By Proposition 2.18, since f is injective,
(f (u1 ), . . . , f (un )) is linearly independent, and since E has dimension n, it is a basis of
E (if (f (u1 ), . . . , f (un )) doesn’t span E, then it can be extended to a basis of dimension
strictly greater than n, contradicting Theorem 2.14). Then f is bijective, and by a previous
observation its inverse is a linear map. We also have
1 1 1 1
g = g id = g (f f ) = (g f ) f = id f =f .
(2) The equation f h = id implies that f is surjective; this is a standard result about
functions (for any y 2 E, we have f (h(y)) = y). Let (u1 , . . . , un ) be any basis of E. By
Proposition 2.18, since f is surjective, (f (u1 ), . . . , f (un )) spans E, and since E has dimension
n, it is a basis of E (if (f (u1 ), . . . , f (un )) is not linearly independent, then because it spans
E, it contains a basis of dimension strictly smaller than n, contradicting Theorem 2.14).
Then f is bijective, and by a previous observation its inverse is a linear map. We also have
1 1 1 1
h = id h = (f f) h = f (f h) = f id = f .
This completes the proof.

Definition 2.24. The set of all linear maps between two vector spaces E and F is denoted by
Hom(E, F ) or by L(E; F ) (the notation L(E; F ) is usually reserved to the set of continuous
linear maps, where E and F are normed vector spaces). When we wish to be more precise and
specify the field K over which the vector spaces E and F are defined we write HomK (E, F ).
The set Hom(E, F ) is a vector space under the operations defined in Example 2.3, namely
(f + g)(x) = f (x) + g(x)
for all x 2 E, and

( f )(x) = f (x)
for all x 2 E. The point worth checking carefully is that f is indeed a linear map, which
uses the commutativity of ⇤ in the field K (typically, K = R or K = C). Indeed, we have
( f )(µx) = f (µx) = µf (x) = µ f (x) = µ( f )(x).
When E and F have finite dimensions, the vector space Hom(E, F ) also has finite di-
mension, as we shall see shortly.
2.8. LINEAR FORMS AND THE DUAL SPACE 69
Definition 2.25. When E = F , a linear map f : E ! E is also called an endomorphism.

The space Hom(E, E) is also denoted by End(E).
It is also important to note that composition confers to Hom(E, E) a ring structure.

Indeed, composition is an operation : Hom(E, E) ⇥ Hom(E, E) ! Hom(E, E), which is
associative and has an identity idE , and the distributivity properties hold:
(g1 + g2 ) f = g1 f + g2 f ;
g (f1 + f2 ) = g f1 + g f2 .
The ring Hom(E, E) is an example of a noncommutative ring.

Using Proposition 2.3 it is easily seen that the set of bijective linear maps f : E ! E is
a group under composition.
Definition 2.26. Bijective linear maps f : E ! E are also called automorphisms. The
group of automorphisms of E is called the general linear group (of E), and it is denoted by
GL(E), or by Aut(E), or when E = Rn , by GL(n, R), or even by GL(n).
2.8 Linear Forms and the Dual Space

We already observed that the field K itself (K = R or K = C) is a vector space (over itself).
The vector space Hom(E, K) of linear maps from E to the field K, the linear forms, plays
a particular role. In this section, we only define linear forms and show that every finite-
dimensional vector space has a dual basis. A more advanced presentation of dual spaces and
duality is given in Chapter 10.
Definition 2.27. Given a vector space E, the vector space Hom(E, K) of linear maps from
E to the field K is called the dual space (or dual) of E. The space Hom(E, K) is also denoted
by E ⇤ , and the linear maps in E ⇤ are called the linear forms, or covectors. The dual space
E ⇤⇤ of the space E ⇤ is called the bidual of E.
As a matter of notation, linear forms f : E ! K will also be denoted by starred symbol,

such as u⇤ , x⇤ , etc.
If E is a vector space of finite dimension n and (u1 , . . . , un ) is a basis of E, for any linear
form f ⇤ 2 E ⇤ , for every x = x1 u1 + · · · + xn un 2 E, by linearity we have
f ⇤ (x) = f ⇤ (u1 )x1 + · · · + f ⇤ (un )xn

= 1 x1 + · · · + n xn ,
with i = f ⇤ (ui ) 2 K for every i, 1  i  n. Thus, with respect to the basis (u1 , . . . , un ),
the linear form f ⇤ is represented by the row vector
( 1 ··· n ),
we have 0 1
x1
B .. C
f ⇤ (x) = 1 ··· n @ . A,
xn
a linear combination of the coordinates of x, and we can view the linear form f ⇤ as a linear
equation. If we decide to use a column vector of coefficients
0 1
c1
B .. C
[email protected]
cn
instead of a row vector, then the linear form f ⇤ is defined by
f ⇤ (x) = c> x.
>
Observe that c = . The above notation is often used in machine learning.
Example 2.9. Given any di↵erentiable function f : Rn ! R, by definition, for any x 2 Rn ,

the total derivative dfx of f at x is the linear form dfx : Rn ! R defined so that for all
u = (u1 , . . . , un ) 2 Rn ,
0 1
✓ ◆ u1 n
@f @f B .. C X @f
dfx (u) = (x) · · · (x) @ . A = (x) ui .
@x1 @xn i=1
@xi
un
Example 2.10. Let C([0, 1]) be the vector space of continuous functions f : [0, 1] ! R. The
map I : C([0, 1]) ! R given by
Z 1
I(f ) = f (x)dx for any f 2 C([0, 1])
0
is a linear form (integration).
Example 2.11. Consider the vector space Mn (R) of real n⇥n matrices. Let tr : Mn (R) ! R
be the function given by
tr(A) = a11 + a22 + · · · + ann ,
called the trace of A. It is a linear form. Let s : Mn (R) ! R be the function given by
n
X
s(A) = aij ,
i,j=1
where A = (aij ). It is immediately verified that s is a linear form.

2.8. LINEAR FORMS AND THE DUAL SPACE 71
Given a vector space E and any basis (ui )i2I for E, we can associate to each ui a linear
form u⇤i 2 E ⇤ , and the u⇤i have some remarkable properties.
Definition 2.28. Given a vector space E and any basis (ui )i2I for E, by Proposition 2.18,
for every i 2 I, there is a unique linear form u⇤i such that
⇢
⇤ 1 if i = j
ui (uj ) =
0 if i 6= j,
for every j 2 I. The linear form u⇤i is called the coordinate form of index i w.r.t. the basis
(ui )i2I .
Remark: Given an index set I, authors often define the so called “Kronecker symbol” ij
such that ⇢
1 if i = j
ij =
0 if i 6= j,
for all i, j 2 I. Then, u⇤i (uj ) = i j.
The reason for the terminology coordinate form is as follows: If E has finite dimension
and if (u1 , . . . , un ) is a basis of E, for any vector
v= 1 u1 + ··· + n un ,
we have
u⇤i (v) = u⇤i ( 1 u1 + · · · + n un )
= 1 u⇤i (u1 ) + · · · + i u⇤i (ui ) + · · · + ⇤
n ui (un )
= i,
since u⇤i (uj ) = i j . Therefore, u⇤i is the linear function that returns the ith coordinate of a
vector expressed over the basis (u1 , . . . , un ).
The following theorem shows that in finite-dimension, every basis (u1 , . . . , un ) of a vector
space E yields a basis (u⇤1 , . . . , u⇤n ) of the dual space E ⇤ , called a dual basis.
Theorem 2.21. (Existence of dual bases) Let E be a vector space of dimension n. The
following property holds: For every basis (u1 , . . . , un ) of E, the family of coordinate forms
(u⇤1 , . . . , u⇤n ) is a basis of E ⇤ (called the dual basis of (u1 , . . . , un )).
Proof. If v ⇤ 2 E ⇤ is any linear form, consider the linear form
f ⇤ = v ⇤ (u1 )u⇤1 + · · · + v ⇤ (un )u⇤n .
Observe that because u⇤i (uj ) = i j,
f ⇤ (ui ) = (v ⇤ (u1 )u⇤1 + · · · + v ⇤ (un )u⇤n )(ui )

= v ⇤ (u1 )u⇤1 (ui ) + · · · + v ⇤ (ui )u⇤i (ui ) + · · · + v ⇤ (un )u⇤n (ui )
= v ⇤ (ui ),
and so f ⇤ and v ⇤ agree on the basis (u1 , . . . , un ), which implies that
v ⇤ = f ⇤ = v ⇤ (u1 )u⇤1 + · · · + v ⇤ (un )u⇤n .
Therefore, (u⇤1 , . . . , u⇤n ) spans E ⇤ . We claim that the covectors u⇤1 , . . . , u⇤n are linearly inde-
pendent. If not, we have a nontrivial linear dependence
⇤ ⇤
1 u1 + ··· + n un = 0,
and if we apply the above linear form to each ui , using a familar computation, we get
⇤
0= i ui (ui ) = i,
proving that u⇤1 , . . . , u⇤n are indeed linearly independent. Therefore, (u⇤1 , . . . , u⇤n ) is a basis of
E ⇤.
In particular, Theorem 2.21 shows a finite-dimensional vector space and its dual E ⇤ have
the same dimension.
We explained just after Definition 2.27 that if the space E is finite-dimensional and has
a finite basis (u1 , . . . , un ), then a linear form f ⇤ : E ! K is represented by the row vector of
coefficients
f ⇤ (u1 ) · · · f ⇤ (un ) . (1)
The proof of Theorem 2.21 shows that over the dual basis (u⇤1 , . . . , u⇤n ) of E ⇤ , the linear form
f ⇤ is represented by the same coefficients, but as the column vector
0 1
f ⇤ (u1 )
B .. C
@ . A, (2)
⇤
f (un )
which is the transpose of the row vector in (1).
2.9 Summary
The main concepts and results of this chapter are listed below:
• The notion of a vector space.
• Families of vectors.
• Linear combinations of vectors; linear dependence and linear independence of a family

of vectors.
• Linear subspaces.
2.9. SUMMARY 73
• Spanning (or generating) family; generators, finitely generated subspace; basis of a

subspace.
• Every linearly independent family can be extended to a basis (Theorem 2.10).
• A family B of vectors is a basis i↵ it is a maximal linearly independent family i↵ it is

a minimal generating family (Proposition 2.11).
• The replacement lemma (Proposition 2.13).
• Any two bases in a finitely generated vector space E have the same number of elements;
this is the dimension of E (Theorem 2.14).
• Hyperplanes.
• Every vector has a unique representation over a basis (in terms of its coordinates).
• Matrices
• Column vectors, row vectors.
• Matrix operations: addition, scalar multiplication, multiplication.
• The vector space Mm,n (K) of m ⇥ n matrices over the field K; The ring Mn (K) of
n ⇥ n matrices over the field K.
• The notion of a linear map.
• The image Im f (or range) of a linear map f .
• The kernel Ker f (or nullspace) of a linear map f .
• The rank rk(f ) of a linear map f .
• The image and the kernel of a linear map are subspaces. A linear map is injective i↵
its kernel is the trivial space (0) (Proposition 2.17).
• The unique homomorphic extension property of linear maps with respect to bases
(Proposition 2.18 ).
• The vector space of linear maps HomK (E, F ).
• Linear forms (covectors) and the dual space E ⇤ .
• Coordinate forms.
• The existence of dual bases (in finite dimension).

2.10 Problems
Problem 2.1. Let H be the set of 3 ⇥ 3 upper triangular matrices given by
80 1 9
< 1 a b =
H= @ A
0 1 c | a, b, c 2 R .
: ;
0 0 1
(1) Prove that H with the binary operation of matrix multiplication is a group; find
explicitly the inverse of every matrix in H. Is H abelian (commutative)?
(2) Given two groups G1 and G2 , recall that a homomorphism if a function ' : G1 ! G2
such that
'(ab) = '(a)'(b), a, b 2 G1 .
Prove that '(e1 ) = e2 (where ei is the identity element of Gi ) and that
'(a 1 ) = ('(a)) 1 , a 2 G1 .
(3) Let S 1 be the unit circle, that is
S 1 = {ei✓ = cos ✓ + i sin ✓ | 0  ✓ < 2⇡},
and let ' be the function given by
0 1
1 a b
' @0 1 c A = (a, c, eib ).
0 0 1
Prove that ' is a surjective function onto G = R ⇥ R ⇥ S 1 , and that if we define
multiplication on this set by
(x1 , y1 , u1 ) · (x2 , y2 , u2 ) = (x1 + x2 , y1 + y2 , eix1 y2 u1 u2 ),
then G is a group and ' is a group homomorphism from H onto G.
(4) The kernel of a homomorphism ' : G1 ! G2 is defined as
Ker (') = {a 2 G1 | '(a) = e2 }.
Find explicitly the kernel of ' and show that it is a subgroup of H.
Problem 2.2. For any m 2 Z with m > 0, the subset mZ = {mk | k 2 Z} is an abelian
subgroup of Z. Check this.
(1) Give a group isomorphism (an invertible homomorphism) from mZ to Z.
(2) Check that the inclusion map i : mZ ! Z given by i(mk) = mk is a group homomor-
phism. Prove that if m 2 then there is no group homomorphism p : Z ! mZ such that
p i = id.
Remark: The above shows that abelian groups fail to have some of the properties of vector
spaces. We will show later that a linear map satisfying the condition p i = id always exists.
2.10. PROBLEMS 75
Problem 2.3. Let E = R ⇥ R, and define the addition operation

(x1 , y1 ) + (x2 , y2 ) = (x1 + x2 , y1 + y2 ), x1 , x2 , y1 , y2 2 R,
and the multiplication operation · : R ⇥ E ! E by
· (x, y) = ( x, y), , x, y 2 R.
Show that E with the above operations + and · is not a vector space. Which of the
axioms is violated?
Problem 2.4. (1) Prove that the axioms of vector spaces imply that
↵·0=0
0·v =0
↵ · ( v) = (↵ · v)
( ↵) · v = (↵ · v),
for all v 2 E and all ↵ 2 K, where E is a vector space over K.
(2) For every 2 R and every x = (x1 , . . . , xn ) 2 Rn , define x by
x = (x1 , . . . , xn ) = ( x1 , . . . , xn ).
Recall that every vector x = (x1 , . . . , xn ) 2 Rn can be written uniquely as
x = x1 e 1 + · · · + xn e n ,
where ei = (0, . . . , 0, 1, 0, . . . , 0), with a single 1 in position i. For any operation · : R ⇥ Rn !
Rn , if · satisfies the Axiom (V1) of a vector space, then prove that for any ↵ 2 R, we have
↵ · x = ↵ · (x1 e1 + · · · + xn en ) = ↵ · (x1 e1 ) + · · · + ↵ · (xn en ).
Conclude that · is completely determined by its action on the one-dimensional subspaces of
Rn spanned by e1 , . . . , en .
(3) Use (2) to define operations · : R ⇥ Rn ! Rn that satisfy the Axioms (V1–V3), but
for which Axiom V4 fails.
(4) For any operation · : R ⇥ Rn ! Rn , prove that if · satisfies the Axioms (V2–V3), then
for every rational number r 2 Q and every vector x 2 Rn , we have
r · x = r(1 · x).
In the above equation, 1 · x is some vector (y1 , . . . , yn ) 2 Rn not necessarily equal to x =
(x1 , . . . , xn ), and
r(1 · x) = (ry1 , . . . , ryn ),
as in Part (2).
Use (4) to conclude that any operation · : Q⇥Rn ! Rn that satisfies the Axioms (V1–V3)
is completely determined by the action of 1 on the one-dimensional subspaces of Rn spanned
by e1 , . . . , en .
Problem 2.5. Let A1 be the following matrix:

0 1
2 3 1
A1 = @ 1 2 1A .
3 5 1
Prove that the columns of A1 are linearly independent. Find the coordinates of the vector
x = (6, 2, 7) over the basis consisting of the column vectors of A1 .
0 1
1 2 1 1
B2 3 2 3C
A2 = B@ 1 0
C.
1 1A
2 1 3 0
Express the fourth column of A2 as a linear combination of the first three columns of A2 . Is
the vector x = (7, 14, 1, 2) a linear combination of the columns of A2 ?
0 1
1 1 1
A3 = @1 1 2A .
1 2 3
x = (6, 9, 14) over the basis consisting of the column vectors of A3 .
0 1
1 2 1 1
B2 3 2 3C
A4 = B@ 1 0
C.
1 1A
2 1 4 0
x = (7, 14, 1, 2) over the basis consisting of the column vectors of A4 .
Problem 2.9. Consider the following Haar matrix
0 1
1 1 1 0
B1 1 1 0C
H=B @1
C.
1 0 1A
1 1 0 1
Prove that the columns of H are linearly independent.

Hint. Compute the product H > H.
2.10. PROBLEMS 77
Problem 2.10. Consider the following Hadamard matrix

0 1
1 1 1 1
B1 1 1 1C
H4 = B@1 1
C.
1 1A
1 1 1 1
Prove that the columns of H4 are linearly independent.

Hint. Compute the product H4> H4 .
Problem 2.11. In solving this problem, do not use determinants.

(1) Let (u1 , . . . , um ) and (v1 , . . . , vm ) be two families of vectors in some vector space E.
Assume that each vi is a linear combination of the uj s, so that
v i = ai 1 u 1 + · · · + ai m u m , 1  i  m,
and that the matrix A = (ai j ) is an upper-triangular matrix, which means that if 1  j <
i  m, then ai j = 0. Prove that if (u1 , . . . , um ) are linearly independent and if all the
diagonal entries of A are nonzero, then (v1 , . . . , vm ) are also linearly independent.
Hint. Use induction on m.
(2) Let A = (ai j ) be an upper-triangular matrix. Prove that if all the diagonal entries of
A are nonzero, then A is invertible and the inverse A 1 of A is also upper-triangular.
Hint. Use induction on m.
Prove that if A is invertible, then all the diagonal entries of A are nonzero.
(3) Prove that if the families (u1 , . . . , um ) and (v1 , . . . , vm ) are related as in (1), then
(u1 , . . . , um ) are linearly independent i↵ (v1 , . . . , vm ) are linearly independent.
Problem 2.12. In solving this problem, do not use determinants. Consider the n ⇥ n
matrix 0 1
1 2 0 0 ... 0 0
B0 1 2 0 . . . 0 0C
B C
B0 0 1 2 . . . 0 0C
B C
B . . . . .. .. C
A = B ... ... . . . . . . .C .
B C
B0 0 . . . 0 1 2 0C
B C
@0 0 . . . 0 0 1 2A
0 0 ... 0 0 0 1
(1) Find the solution x = (x1 , . . . , xn ) of the linear system
Ax = b,
for 0 1
b1
B b2 C
B C
b = B .. C .
@.A
bn
(2) Prove that the matrix A is invertible and find its inverse A 1 . Given that the number
of atoms in the universe is estimated to be  1082 , compare the size of the coefficients the
inverse of A to 1082 , if n 300.
(3) Assume b is perturbed by a small amount b (note that b is a vector). Find the new
solution of the system
A(x + x) = b + b,
where x is also a vector. In the case where b = (0, . . . , 0, 1), and b = (0, . . . , 0, ✏), show
that
|( x)1 | = 2n 1 |✏|.
(where ( x)1 is the first component of x).
(4) Prove that (A I)n = 0.
Problem 2.13. An n ⇥ n matrix N is nilpotent if there is some integer r 1 such that

N r = 0.
(1) Prove that if N is a nilpotent matrix, then the matrix I N is invertible and
1
(I N) = I + N + N 2 + · · · + N r 1.
(2) Compute the inverse of the following matrix A using (1):

0 1
1 2 3 4 5
B0 1 2 3 4C
B C
A=B
B0 0 1 2 3CC.
@0 0 0 1 2A
0 0 0 0 1
Problem 2.14. (1) Let A be an n ⇥ n matrix. If A is invertible, prove that for any x 2 Rn ,
if Ax = 0, then x = 0.
(2) Let A be an m ⇥ n matrix and let B be an n ⇥ m matrix. Prove that Im AB is
invertible i↵ In BA is invertible.
Hint. If for all x 2 Rn , M x = 0 implies that x = 0, then M is invertible.
2.10. PROBLEMS 79
Problem 2.15. Consider the following n ⇥ n matrix, for n 3:

0 1
1 1 1 1 ··· 1 1
B1 1 1 1 ··· 1 1C
B C
B1 1 1 1 ··· 1 1C
B C
B ··· 1C
B = B1 1 1 1 1 C
B .. .. .. .. .. .. .. C
B. . . . . . . C
B C
@1 1 1 1 ··· 1 1A
1 1 1 1 ··· 1 1
(1) If we denote the columns of B by b1 , . . . , bn , prove that
(n 3)b1 (b2 + · · · + bn ) = 2(n 2)e1
b1 b2 = 2(e1 + e2 )
b1 b3 = 2(e1 + e3 )
.. ..
. .
b1 bn = 2(e1 + en ),
where e1 , . . . , en are the canonical basis vectors of Rn .
(2) Prove that B is invertible and that its inverse A = (aij ) is given by
(n 3) 1
a11 = , ai1 = 2in
2(n 2) 2(n 2)
and
(n 3)
aii = , 2in
2(n 2)
1
aji = , 2  i  n, j 6= i.
2(n 2)
(3) Show that the n diagonal n ⇥ n matrices Di defined such that the diagonal entries of
Di are equal the entries (from top down) of the ith column of B form a basis of the space of
n ⇥ n diagonal matrices (matrices with zeros everywhere except possibly on the diagonal).
For example, when n = 4, we have
0 1 0 1
1 0 0 0 1 0 0 0
B0 1 0 0 C B0 1 0 0C
D1 = B@0 0 1 0 A
C D = B
@0
C,
2
0 1 0A
0 0 0 1 0 0 0 1
0 1 0 1
1 0 0 0 1 0 0 0
B 0 1 0 0C B0 1 0 0C
D3 = B@0 0
C, D = B
@ 0 0 1 0 A.
C
1 0A 4
0 0 0 1 0 0 0 1
Problem 2.16. Given any m⇥n matrix A and any n⇥p matrix B, if we denote the columns
of A by A1 , . . . , An and the rows of B by B1 , . . . , Bn , prove that
AB = A1 B1 + · · · + An Bn .
Problem 2.17. Let f : E ! F be a linear map which is also a bijection (it is injective and
surjective). Prove that the inverse function f 1 : F ! E is linear.
Problem 2.18. Given two vectors spaces E and F , let (ui )i2I be any basis of E and let
(vi )i2I be any family of vectors in F . Prove that the unique linear map f : E ! F such that
f (ui ) = vi for all i 2 I is surjective i↵ (vi )i2I spans F .
Problem 2.19. Let f : E ! F be a linear map with dim(E) = n and dim(F ) = m. Prove
that f has rank 1 i↵ f is represented by an m ⇥ n matrix of the form
A = uv >
with u a nonzero column vector of dimension m and v a nonzero column vector of dimension
n.
Problem 2.20. Find a nontrivial linear dependence among the linear forms
'1 (x, y, z) = 2x y + 3z, '2 (x, y, z) = 3x 5y + z, '3 (x, y, z) = 4x 7y + z.
Problem 2.21. Prove that the linear forms
'1 (x, y, z) = x + 2y + z, '2 (x, y, z) = 2x + 3y + 3z, '3 (x, y, z) = 3x + 7y + z
are linearly independent. Express the linear form '(x, y, z) = x+y+z as a linear combination
of '1 , '2 , '3 .

Linalg I

Uploaded by

Copyright:

Available Formats

Linalg I

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linalg I

Uploaded by

Copyright:

Available Formats

Linear Algebra for Computer Vision,

Robotics, and Machine Learning

Jean Gallier and Jocelyn Quaintance

November 14, 2023

3. Affine maps (see Section 5.5).

4. Norms and matrix norms (Chapter 8).

7. An introduction to algebraic and spectral graph theory.

8. Applications of SVD and pseudo-inverses, in particular, principal component analysis,

1. Duality (Chapter 10).

2. Dual norms (Section 13.7).

4. The spectral theorems (Chapter 16).

2 Vector Spaces, Bases, Linear Maps 17

3 Matrices and Linear Maps 81

4 Haar Bases, Haar Wavelets, Hadamard Matrices 107

5 Direct Sums, Rank-Nullity Theorem, Affine Maps 133

5.2 Sums and Direct Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

7 Gaussian Elimination, LU, Cholesky, Echelon Form 215

8 Vector Norms and Matrix Norms 295

8.4 Inequalities Involving Subordinate Norms . . . . . . . . . . . . . . . . . . . 319

9 Iterative Methods for Solving Linear Systems 345

10 The Dual Space and Duality 369

11 Euclidean Spaces 403

12 QR-Decomposition for Arbitrary Matrices 457

12.1 Orthogonal Reflections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457

13 Hermitian Spaces 479

14 Eigenvectors and Eigenvalues 519

15 Unit Quaternions and Rotations in SO(3) 551

16 Spectral Theorems 575

16.7 The Courant–Fischer Theorem; Perturbation Results . . . . . . . . . . . . . 600

17 Computing Eigenvalues and Eigenvectors 611

18 Graphs and Graph Laplacians; Basic Facts 641

19 Spectral Graph Drawing 665

20 Singular Value Decomposition and Polar Form 675

21 Applications of SVD and Pseudo-Inverses 697

21.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726

22 Annihilating Polynomials; Primary Decomposition 731

8. Applications of SVD and pseudo-inverses, in particular, principal component analysis,

1. Matrix factorizations such as LU , P A = LU , Cholesky, and reduced row echelon form

2. Duality (Chapter 10).

3. Dual norms (Section 13.7).

5. The spectral theorems (Chapter 16).

Another important fact is that if E is a finite-dimensional space with an inner product

Vector Spaces, Bases, Linear Maps

2.1 Motivations: Linear Combinations, Linear Inde-

Also, given a vector 0 1

Observe that x = ( 1)x, the scalar multiplication of x by 1.

is equivalent to determining whether b can be expressed as a linear combination of u, v, w.

(y1 x1 )u + (y2 x2 )v + (y3 x3 )w = 0,

which implies that

which confirms that u, v, w are linearly independent.

If we form the vector of unknowns 0 1

then our linear combination x1 u + x2 v + x3 w can be written in matrix form as

so our linear system is expressed by

Actually, since w = u v, the above system is equivalent to

In summary, a 3 ⇥ 3 linear system may have a unique solution, no solution, or an infinite

Inner products play a very important role. First, the quantity

with b 2 Rn . Suppose we can find an n ⇥ n matrix B such that

If Ax = b, then multiplying both sides on the left by B, we get