On Adding and Subtracting Eigenspaces With Evd and SVD: Peter Hall David Marshall Ralph Martin
On Adding and Subtracting Eigenspaces With Evd and SVD: Peter Hall David Marshall Ralph Martin
b Department
Abstract
This paper provides two algorithms: one for adding eigenspaces, another for subtracting them, thus allowing for incremental updating and downdating of data models. Importantly, and unlike previous work, we keep an accurate track of the mean
of the data, which allows our methods to be used in classification applications.
The result of adding eigenspaces, each made from a set of data, is an approximation to that which would obtain were the sets of data taken together. Subtracting
eigenspaces yields a result approximating that which would obtain were a subset of
data used. Using our algorithms it is possible to perform arithmetic on eigenspaces
without reference to the original data. We illustrate the use of our algorithms in
three generic applications, including the dynamic construction of Gaussian mixture
models. In addition, we mention singular value decomposition as an alternative to
eigenvalue decomposition. We show that updating SVD models comes at the cost of
space resources, and argue that downdating SVD models is not possible in closedform.
Key words:
Eigenvalue decomposition, dynamic updating and downdating, Gaussian mixture
models, singular value decomposition.
Introduction
This subject of this paper is incremental eigenanalysis: we provide an algorithm for including new data into an eigenspace, and another for removing
data. An eigenspace comprises: the number of data points, their mean, the
eigenvectors, and the eigenvalues that result from the eigenvalue decomposition (EVD) of the data covariance matrix. Typically the eigenspace is deflated, which is to say that only significant eigenvectors and eigenvalues are
retained in the eigenspace. The inclusion of new data is sometimes called updating, while the removal of data is sometimes called downdating. Rather than
use data directly, we use eigenspace representations of the data, hence we add
or subtract eigenspaces. Our methods are presented in Section 2.
Singular value decomposition (SVD) is closely related to EVD, and is preferred
by some authors. We comment upon SVD methods in Section 4, where we
present a novel method for incrementally computing SVD, including a shift
of the mean. We also argue that removing data from SVD is not possible in
closed form.
We must make clear the difference between batch and incremental methods
for computing eigenspace models. A batch method computes an eigenmodel
using all observations simultaneously. An incremental method computes an
eigenspace model by successively updating an earlier model as new observations become available. In either case, the observations used to construct the
eigenspace model are the training observations; that is, they are assumed to
be instances from some class. This model may then used to decide whether
further observations belong to the class.
Incremental eigenanalysis has been studied previously [14,7,13], but surprisingly these authors either have ignored the fact that a change in data changes
the mean, or else have handled it in an ad hoc way. In contrast, our previous
work allows for a change of mean [10], where we allowed for the inclusion of
only a single new datum. Here, our algorithms handle block update and downdate, so many observations can be included or removed in a single step. They
explicitly allow the mean to vary in a principled and accurate manner, and
this is important. Consider, for example, that functions such as the Mahalanobis distance, often used in classification applications, cannot be computed
without the mean; previous solutions cannot be used in this case.
Applications of incremental methods are wide ranging both within computer
vision and beyond. Focusing on computer vision, applications include: face
recognition [12], modelling variances in geometry [6], and the estimation of
motion parameters [4].
Our motivations for this work arose from several sources, one example being
the construction of classification models for many images too many to fit
all into memory at once. Intuition, confirmed by experiment, suggests it is
better to construct the eigenspace model from all the images rather than a
subset of them, which is all that could be done if using a batch method; hence
the need for an incremental method (see Section 3). An example is a database
of photographs for a security application in which images need to be added
and deleted each year, yet not all images can be kept in memory at once (see
Section 3). Our methods allow the database to be updated and downdated
2
(1)
(2)
This collection is usually distinct from X, but such distinction is not a requirement. Notice that q eigenvectors and eigenvalues are kept in this model,
3
(3)
(4)
with reference to (X) and (Y ) only; that is, define the algorithm for the
operator. We assume the original data are not available. In general, the
number of eigenvectors and eigenvalues kept, r, differs from both p and q.
This implies that addition must account for a possible change in dimension of
the eigenspace.
The problem for subtraction is to compute (X)
(X) = (Z) (Y )
(5)
2.1 Addition
Incremental computation of N(Z) and (Z) is straightforward:
N(Z) = N(X) + N(Y )
(Z) = (N(X)(X) + N(Y )(Y ))/N(Z)
(6)
(7)
Computing eigenvectors and eigenvalues depends upon properties of the subspaces that the eigenvectors U(X), U(Y ), and U(Z) support; properties we
describe next.
Since U(Z) must support all data in both collections, X and Y , both U(X)
and U(Y ) must be subspaces of U(Z). Generally, we might expect that these
subspaces intersect in the sense that U(X)T U(Y ) 6= 0. The null space of
each of U(X) and U(Y ) may contain some component of the other, that is
to say H = U(Y ) U(X)(U(X)T U(Y )) 6= 0. Both of these conditions are
illustrated in Figure 1. Furthermore, even if U(X) and U(Y ) are each a basis
4
for the same subspace, U(Z) could be of larger dimension. This is because
some component, h say, of the vector joining the means, (X) (Y ) may be
in the null space of both subspaces, simultaneously. For example (X), U(X)
and (Y ), U(Y ) define a pair of planes parallel to the xy-plane, but separated
in the z direction, as in Figure 1.
Subspaces (x1,x2) and (u1,u2) are embedded in (e1,e2,e3)
Subspace (u1,u2) has components in (x1,x2), marked by dashed lines.
It also has components in the null space of (x1,x2), marked by dotted lines.
Each component is embedded in (e1,e2,e3).
u2
u1
e3
v1
x2
x1
e1
e2
y1
v2
y2
N(Z)
0tp 0tt
N(Z) tq (Y )qq GT tq (Y ) T
pq
qq tq
T
gp gp
gp tT
N(X)N(Y )
= Rss ss Rss
T
T
N(Z)2
t gp t t
in which is diagonal and
5
(8)
gp = U(X)T ((X) (Y ))
Gpq = U(X)T U(Y )
Hnq = [U(Y ) U(X)Gpq ]
hn = ((X) (Y )) U(X)gp
nt = Orthobasis([Hnq , hn ])
T
tq = nt
U(Y )nq
T
t = ((X) (Y ))
(9)
(10)
(11)
(12)
(13)
(14)
(15)
is an operation that removes very small column vectors from the matrix, and
Orthobasis computes a set of mutually orthogonal, unit vectors that support its
argument; typically Gramm-Schmidt orthogonalisation [8] is used to compute
significant support vectors, from [H, h]; these are outside the eigenmodel
(X). Note that while T = I, T 6= I. Also, G is the projection of the
(Y ) eigenspace onto (X) (the U vectors), while is the projection of (Y )
onto the complementary space to (X) (the vectors). This complementary
space must be determined to compute the new eigenspace (Z), which argues
in favour of adding and subtracting eigenspaces, rather than direct updating
or downdating of data blocks.
Given the above decomposition, we can complete our computation of (Z):
(Z)s = diag(ss )
Uns (Z) = [Unp nt ]Rss
(16)
(17)
2.2 Subtraction
The algorithm for subtraction is very similar to that for addition. First compute the number of data, and their mean:
N(X) = N(Z) N(Y )
(X) = (N(Z)(Z) N(Y )(Y ))/N(X)
6
(18)
(19)
In this case U(Z) is a sufficient spanning set to rotate. To compute the rotation
we use the eigendecomposition:
N(Y )
N(Y ) T
N(Z)
T
(Z)rr
Grp (Y )pp GTrp
gr g = Rrr (X)rr Rrr
(20)
N(X)
N(X)
N(Z) r
where Grp = U(Z)Tnr U(X)nq and gr = U(Z)Tnr (Y X). The eigenvalues
we seek are the p non-zero elements on the diagonal of (X)rr . Thus we can
permute Rrr and (X)rr , and write without loss of generality:
T
Rrr (X)rr Rrr
= [Rrp Rrt ]
(X)pp 0pt
T
[Rrp Rrt ]
0tp 0tt
T
= Rrp (X)pp Rrp
(21)
(22)
add a pair of existing eigenspaces than to compute their sum ab initio. Similar remarks apply to splitting: removing a few data points is a comparatively
efficient operation. The conclusion we reach is that addition and subtraction
of eigenspaces is no less efficient than batch methods, and in most cases is
performed much more efficiently.
We have compared the angular deviation of eigenvectors, change in eigenvalues, accuracy of data representation in the least-squares sense, and classification performance [9]. The incremental methods for addition generally compare
very well with batch methods, with discrepancies being a minimum when the
two eigenspaces added are of about the same size; the exception is the discrepancy in eigenvalues, which shows a maximum of about one part in 105 at
that point. Reasons for this behaviour are the subject of future work we
have not yet undertaken a rigorous analysis of errors.
The subtraction operator tends to instability as the number of points being
removed rises, since in this case N(X) 0, hence 1/N(X) . In the limit
of all points being removed N(X) = 0, and an exception must be coded to
return a null eigenspace. Unfortunately, we have found that prior scaling by
N(X) to be ineffective and have concluded that, in practice, subtraction is
best used to remove a small fraction of the data points.
Applications
http://www.cam-orl.co.uk/facedatabase.html
0.9
0.9
0.8
0.7
0.8
0.6
0.5
0.4
0.3
0.6
0.5
0.4
0.3
0.2
0.2
0.1
0.1
10
15
20
Person index
25
30
35
0.7
40
10
15
20
Person index
25
30
35
40
Fig. 2. Weight of evidence measures: year 2 batch (left), and year 2 incremental
(right).
We notice that both models produce some ambiguous cases, with weights
between 0 and 1, and that the incrementally computed eigenspace gives rise
to more of these cases than the eigenmodel computed via batch methods. This
result is in line with our earlier comments regarding the relative inaccuracy
of subtraction. Even so, only those people in the database score 1, while
everyone outside scored less than 1 and hence classification is still possible.
Given our observations, above, regarding previous measures when subtracting
eigenspaces, we conclude that additive incremental eigenanalysis is safe for
classification metrics, but that subtractive incremental eigenanalysis needs a
greater degree of caution.
Fig. 3. Sample images of each toy used as source data in our dynamic GMM application.
Thus, including the top-level eigenspace, each set of toy photographs was represented with nineteen eigenspace models. To merge the GMMs for the pair
of toys we first added added together the two top-most eigenspaces to make
a complete eigenspace for all 144 photographs. Next we transformed each of
the GMM clusters into this space, thus bringing each of the thirty-six GMMs
(eighteen from each individual hierarchy) into the same (large) eigenspace
covering the ensemble of data. We then merged eigenspaces (Gaussian components), using a very simple criterion to merge based on reducing volume
of hyperellipses, which is explained below. Hence, we were able to reduce the
total number of Gaussians to 22 in the mixture. These clusters tend to model
different parts of the cylindrical trajectories of the original data projected into
the large eigenspace. Examples of cluster centres are shown in Figure 4: the
two models can be clearly seen in different positions. In addition, we found a
few clusters occupying the space in between the two toys an example of
which is seen in Figure 4.
sM |A| M/2
( M2 + 1)
(23)
This paper has focussed on block updating and block downdating of eigenspace
models, based on eigenvalue decomposition (EVD). However EVD can suffer from conditioning problems (the condition number being the ratio of the
smallest to largest eigenvalue). Singular value decomposition (SVD) tends to
be more stable (and hence accurate) because singular values are proportional
to the square root of eigenvalues, which mitigates conditioning problems. Thus
we have also investigated block updating and block downdating based on singular value decomposition, which we briefly outline next.
We are given a set of N observations, each an n-dimensional column vector.
(Note that sets here can contain repeated elements.) These observations can
be represented by an (nN) matrix, X. The mean of these observationsqis the
vector . The SVD for the origin-centred data set is X 1 = (U V T )/ (N),
where 1 is a row vector of N 1s. The left singular vectors are columns of U, and
are identical to the eigenvectors in the EVD of XX T . The right singular vectors
V are related to the coordinates of the data when transformed
into the U basis
q
(which is the eigenspace): V = ((X 1)T U1 ) (N). The singular values,
, give the length of semi-axes along each U vector and specify the size of a
hyperellipse at unit Mahalanobis distance. The singular values are related to
the eigenvalues in the EVD of XX T , 1/2 . As for eigenmodels, the system
can be deflated by discarding left and right singular vectors
that are associated
q
T
with small singular values: X 1 (Unp pp VN p )/ (N). So, for data X we
specify an SVD model as (X) = ((X), U(X), (X), V (X), N(X)).
The block updating problem for data sets X and Y is to compute the SVD
for their union, conveniently written Z = [X, Y ], given only their SVD models
12
(X) and (Y ). This is done in three stages. First the number N(Z) and mean
(Z) are computed as for EVD updates. An orthonormal basis set which
spans any subspace of U(Y ) not in U(X) is also computed in a way similar
to that for EVD, but need not include the difference between the means
that is accounted for elsewhere in the SVD formulation. Second, the following
singular value decomposition is made:
q
U V T =
N(X)(X)V (X)
q
q
T
T
U(X) ((X) (Z))1N (X) U(X) ((Y ) (Z))1N (Y )
0 ((Y ) (Z))1N (Y )
(24)
Here the second term accounts for the difference in means. Finally, the left
and right singular vectors, and singular values are computed, and deflated as
desired.
U(Z) = U
(25)
S(Z) = S/ N(Z)
V (Z) = V
(26)
(27)
We note that the right singular values are given directly, a shift of mean is
accounted for, and the problem is of size (p+q)(N(X)+N(Y )). Contrasting
our solution with others [3], they add a single new point at time, do not shift
the mean, and require post manipulation multiplication of right singular values
without offering any efficiency gain in terms of problem size.
Downdate of SVD models means removing points in data set Z which are also
in data set Y ; in set-theoretic terms we assume Y is a subset of Z and want to
compute the SVD for X = Z \ Y . This is not straightforward and difficulties
arise in two areas, even if we neglect a change in mean. The first difficulty
comes from the (simplest) form of the problem which is [ABC T , DEF T ] =
GHJ T , where X = ABC T , Y = DEF T , and Z = GHJ T . We must obtain A,
B, and C. By computing the inner product of both sides we obtain AB 2 AT +
DE 2 D T = GH 2 GT , which gives us an EVD problem from which we can
compute A and B. Alternatively, we can use the relationship between EVD
and SVD to compute A, B, and the mean shift using the EVD downdating
methods described above. However, we note that either way SVD downdate
cannot be directly achieved.
The second difficulty arises when we note that the ordering of right singular
vectors depends upon the ordering of data points in the matrix being decomposed. However, the left singular vectors and singular values are invariant to
13
permutation of the data. To see this we suppose P is a permutation matrix (obtained by randomly permuting the identity matrix, so that P P T = P T P = I),
and note that given Z = GHJ T , then ZP = GHJ T P = GH(P T J)T . Therefore, in order to compute the right singular vectors C while downdating, we
must have access to some matrix P which picks out data elements in Z (or,
equivalently, corresponding elements in J). Unfortunately no such information
exists within the SVD model, and consequently computing C in a closed-form
manner seems impossible. The only solution seems to be a resort to search
using data elements in J and F (for these specify data points in Z and Y
respectively). If search is the only solution, then we may simply downdate Z
by building up X incrementally as elements in Z \ Y are found which is
unsatisfactory in our opinion.
Thus we conclude that updating SVD models is possible, at the expense of
keeping right singular values, and that downdate of SVD models is not possible in closed form. However, experimental evidence [10] suggests that SVD
updates are likely to be more accurate than EVD updates.
Conclusion
squares the condition number. We have also omitted comparisons with other
incremental methods, because they deal with adding one new data point (but,
again see our earlier work [10]). The important conclusion from that work is
that updating the mean is crucial for classification results [10].
We would expect our methods to find much wider applicability than those we
have already mentioned in this paper: updating image motion parameters [4],
and selecting salient views [3] are two applications that exist already for incremental methods. We have experimented with image segmentation, building
models of three-dimensional blood vessels, and texture classification. We believe that dynamic Gaussian mixture models provide a very interesting future
path for they enable useful representations [5,11] and all their attendant
properties to be brought into a dynamic framework.
References
[1] James R. Bunch and Christopher P. Nielsen. Updating the singular value
decomposition. Numerische Mathematik, 31:111129, 1978.
[2] James R. Bunch, Christopher P. Nielsen, and Danny C. Sorenson. Rank-one
modification of the symmetric eigenproblem. Numerische Mathematik, 31:31
48, 1978.
[3] S. Chandrasekaran, B.S. Manjunath, Y.F. Wang, J. Winkler, and H.Zhang. An
eigenspace update algorithm for image analysis. Graphical Models and Image
Processing, 59(5):321332, September 1997.
[4] S. Chaudhuri, S. Sharma, and S. Chatterjee. Recursive esitmation of motion
parameters. Computer Vision and Image Understanding, 64(3):434442,
November 1996.
[5] T.F. Cootes and C.J. Taylor. A mixture model for representing shape variations.
In Proc. British Machine Vision Conference, pages 110119, 1997.
[6] T.F. Cootes, C.J. Taylor, D.H. Cooper, and J. Graham. Training models of
shape from sets of examples. In Proc. British Machine Vision Conference,
pages 918, 1992.
[7] Ronald D. DeGroat and Richard Roberts. Efficient, numerically stablized rankone eigenstructure updating. IEEE Transactions on acoustics, speech, and
signal processing, 38(2):301316, February 1990.
[8] Gene H. Golub and Charles F. Van Loan. Matrix computations. Johns Hopkins,
1983.
[9] Peter Hall, David Marshall, and Ralph Martin. Merging and splitting
eigenspaces. IEEE Transactions on Pattern Analysis and Machine Intelligence.
(to appear)
15
[10] Peter Hall, David Marshall, and Ralph Martin. Incrementally computing
eigenspace models. In Proc. British Machine Vision Conference, pages 286
295, Southampton, 1998.
[11] Tony Heap and David Hogg. Improving specificity in pdms using a heirarchical
approach. In Pooc. British Machine Conference, pages 8089, 1997.
[12] Baback Moghaddam and Alex Pentland. Probabilistic visual learning for
object representation. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 19(7):696710, July 1997.
[13] H. Murakami and B.V.K.V Kumar. Efficient calculation of primary images from
a set of images. IEEE Pattern Analysis and Machine Intelligence, 4(5):511515,
September 1982.
16