On Adding and Subtracting Eigenspaces With Evd and SVD: Peter Hall David Marshall Ralph Martin

On adding and subtracting eigenspaces with
EVD and SVD

Peter Hall a David Marshall b Ralph Martin b
a School
of Mathematical Sciences, Bath University
b Department
of Computer Science, Cardiff University
Abstract
This paper provides two algorithms: one for adding eigenspaces, another for subtracting them, thus allowing for incremental updating and downdating of data models. Importantly, and unlike previous work, we keep an accurate track of the mean
of the data, which allows our methods to be used in classification applications.
The result of adding eigenspaces, each made from a set of data, is an approximation to that which would obtain were the sets of data taken together. Subtracting
eigenspaces yields a result approximating that which would obtain were a subset of
data used. Using our algorithms it is possible to perform arithmetic on eigenspaces
without reference to the original data. We illustrate the use of our algorithms in
three generic applications, including the dynamic construction of Gaussian mixture
models. In addition, we mention singular value decomposition as an alternative to
eigenvalue decomposition. We show that updating SVD models comes at the cost of
space resources, and argue that downdating SVD models is not possible in closedform.
Key words:
Eigenvalue decomposition, dynamic updating and downdating, Gaussian mixture
models, singular value decomposition.
Introduction
This subject of this paper is incremental eigenanalysis: we provide an algorithm for including new data into an eigenspace, and another for removing
data. An eigenspace comprises: the number of data points, their mean, the
eigenvectors, and the eigenvalues that result from the eigenvalue decomposition (EVD) of the data covariance matrix. Typically the eigenspace is deflated, which is to say that only significant eigenvectors and eigenvalues are
retained in the eigenspace. The inclusion of new data is sometimes called updating, while the removal of data is sometimes called downdating. Rather than
use data directly, we use eigenspace representations of the data, hence we add
or subtract eigenspaces. Our methods are presented in Section 2.
Singular value decomposition (SVD) is closely related to EVD, and is preferred
by some authors. We comment upon SVD methods in Section 4, where we
present a novel method for incrementally computing SVD, including a shift
of the mean. We also argue that removing data from SVD is not possible in
closed form.
We must make clear the difference between batch and incremental methods
for computing eigenspace models. A batch method computes an eigenmodel
using all observations simultaneously. An incremental method computes an
eigenspace model by successively updating an earlier model as new observations become available. In either case, the observations used to construct the
eigenspace model are the training observations; that is, they are assumed to
be instances from some class. This model may then used to decide whether
further observations belong to the class.
Incremental eigenanalysis has been studied previously [14,7,13], but surprisingly these authors either have ignored the fact that a change in data changes
the mean, or else have handled it in an ad hoc way. In contrast, our previous
work allows for a change of mean [10], where we allowed for the inclusion of
only a single new datum. Here, our algorithms handle block update and downdate, so many observations can be included or removed in a single step. They
explicitly allow the mean to vary in a principled and accurate manner, and
this is important. Consider, for example, that functions such as the Mahalanobis distance, often used in classification applications, cannot be computed
without the mean; previous solutions cannot be used in this case.
Applications of incremental methods are wide ranging both within computer
vision and beyond. Focusing on computer vision, applications include: face
recognition [12], modelling variances in geometry [6], and the estimation of
motion parameters [4].
Our motivations for this work arose from several sources, one example being
the construction of classification models for many images too many to fit
all into memory at once. Intuition, confirmed by experiment, suggests it is
better to construct the eigenspace model from all the images rather than a
subset of them, which is all that could be done if using a batch method; hence
the need for an incremental method (see Section 3). An example is a database
of photographs for a security application in which images need to be added
and deleted each year, yet not all images can be kept in memory at once (see
Section 3). Our methods allow the database to be updated and downdated
2
without recomputing the eigenmodel ab initio.

We are also interested in constructing dynamic Gaussian-mixture-models
(GMMs), that is being able to add and subtract GMMs. For this, the ability
to keep track of the mean while adding (or subtracting) eigenspaces is essential. A full discussion of the issues involved is beyond the scope of the paper,
and is the subject of future work, but we present initial results (see Section 3)
because of the potential of dynamic GMMs. For example, the mixture model
used by Cootes and Taylor [5] can be brought into a dynamic learning framework, and since our GMMs rely on a hierarchy of subspaces, so too can work
such as that of Heap and Hogg [11].
Adding and subtracting eigenspaces
We now state the problems which are our subject.

Let X = [x1 , . . . xN ] be a collection of N data points, each n dimensional. The
eigenmodel of these data is
(X) = ((X), U(X)np , (X)p , N(X))
(1)
in which: (X) is their mean; U(X)np is a collection of p eigenvectors, each

a column in an n p matrix; (X)p is a collection of p eigenvalues, one
per eigenvector, and N is the number of data points. The subscripts on each
element identify its size, where we deem it helpful.
Typically p min(n, N) is the rank of the covariance matrix of X, though this
depends on details of how the model is deflated (see our previous work [10] for a
discussion). We call p the dimension of the eigenspace. We have that U T U = I,
but usually UU T 6= I, so the eigenvectors support a subspace of dimension p
in a space of dimension n. The eigenvalues measure the standard deviation of
the data over each of the eigenvectors, under the assumption that the data
are Gaussian distributed. Hence, (X) may be regarded as representing a
multidimensional Gaussian distribution over a hyperplane, of dimension p, in
some embedding space, of dimension n. Contours of equal likelihood generate
hyperellipses of dimension p.
Another collection of observations Y = [y1 , . . . yM ] has an eigenmodel
(Y ) = ((Y ), U(Y )nq , (Y )q , N(Y ))
(2)
This collection is usually distinct from X, but such distinction is not a requirement. Notice that q eigenvectors and eigenvalues are kept in this model,
3
and in general q 6= p even if Y = X: deflation may occur in different ways.

The problem for addition is to compute the eigenmodel for the pair of collections Z = [X, Y ]
(Z) = ((Z), U(Z)nr , (Z)r , N(Z))
= (X) (Y )
(3)
(4)
with reference to (X) and (Y ) only; that is, define the algorithm for the
operator. We assume the original data are not available. In general, the
number of eigenvectors and eigenvalues kept, r, differs from both p and q.
This implies that addition must account for a possible change in dimension of
the eigenspace.
The problem for subtraction is to compute (X)
(X) = (Z) (Y )
(5)
which is to remove the observations in Y from the eigenmodel in Z. As in the

case of addition, a possible change in the dimension of the eigenspace must be
accounted for.
This paper has space only to present the solutions to these problems, and
derivations are available elsewhere [9].
2.1 Addition
Incremental computation of N(Z) and (Z) is straightforward:
N(Z) = N(X) + N(Y )
(Z) = (N(X)(X) + N(Y )(Y ))/N(Z)
(6)
(7)
Computing eigenvectors and eigenvalues depends upon properties of the subspaces that the eigenvectors U(X), U(Y ), and U(Z) support; properties we
describe next.
Since U(Z) must support all data in both collections, X and Y , both U(X)
and U(Y ) must be subspaces of U(Z). Generally, we might expect that these
subspaces intersect in the sense that U(X)T U(Y ) 6= 0. The null space of
each of U(X) and U(Y ) may contain some component of the other, that is
to say H = U(Y ) U(X)(U(X)T U(Y )) 6= 0. Both of these conditions are
illustrated in Figure 1. Furthermore, even if U(X) and U(Y ) are each a basis
4
for the same subspace, U(Z) could be of larger dimension. This is because
some component, h say, of the vector joining the means, (X) (Y ) may be
in the null space of both subspaces, simultaneously. For example (X), U(X)
and (Y ), U(Y ) define a pair of planes parallel to the xy-plane, but separated
in the z direction, as in Figure 1.
Subspaces (x1,x2) and (u1,u2) are embedded in (e1,e2,e3)
Subspace (u1,u2) has components in (x1,x2), marked by dashed lines.
It also has components in the null space of (x1,x2), marked by dotted lines.
Each component is embedded in (e1,e2,e3).
u2
u1
e3
v1
x2
x1
e1
e2
y1
v2
y2
Subspaces (y1,y2) and (v1,v2) are embedded in (e1,e2,e3).

They are parallel subspaces, the line joining them is in the null space of both.
This line counts as an extra dimension when adding eigenspaces.
Fig. 1. An illustration of relationships between subspaces embedded in a larger

space: intersecting subspaces (left); and parallel subspaces (right).
Putting (temporarily) to one side issues relating to changes in dimension,

adding data acts to rotate the eigenvectors and scale the eigenvalues. Hence,
the new eigenvectors must be a linear combination of the old. We deal with
a change in dimension by constructing a basis sufficient to span U(Z), for
which we use U(X), and a basis, that spans [H, h], which is in the null
space of U(X) with respect to U(Z). We have U(Z) = [U(X), ]R where
R is an orthonormal (rotation) matrix to be found by solving the following
eigenproblem:
N(X) (X)pp 0pt

+
N(Z)
0tp 0tt
N(Y ) Gpq (Y )qq GTpq Gpq (Y )qq Ttq

+
N(Z) tq (Y )qq GT tq (Y ) T
pq
qq tq
T
gp gp
gp tT
N(X)N(Y )
= Rss ss Rss
T
T
N(Z)2
t gp t t
in which is diagonal and
5
(8)
gp = U(X)T ((X) (Y ))
Gpq = U(X)T U(Y )
Hnq = [U(Y ) U(X)Gpq ]
hn = ((X) (Y )) U(X)gp
nt = Orthobasis([Hnq , hn ])
T
tq = nt
U(Y )nq
T
t = ((X) (Y ))
(9)
(10)
(11)
(12)
(13)
(14)
(15)
is an operation that removes very small column vectors from the matrix, and
Orthobasis computes a set of mutually orthogonal, unit vectors that support its
argument; typically Gramm-Schmidt orthogonalisation [8] is used to compute
significant support vectors, from [H, h]; these are outside the eigenmodel
(X). Note that while T = I, T 6= I. Also, G is the projection of the
(Y ) eigenspace onto (X) (the U vectors), while is the projection of (Y )
onto the complementary space to (X) (the vectors). This complementary
space must be determined to compute the new eigenspace (Z), which argues
in favour of adding and subtracting eigenspaces, rather than direct updating
or downdating of data blocks.
Given the above decomposition, we can complete our computation of (Z):
(Z)s = diag(ss )
Uns (Z) = [Unp nt ]Rss
(16)
(17)
The eigenmodel can then be deflated, if desired, to dimension r s.

Each matrix in the above eigendecomposition is of size s = p + t p + q + 1
min(n, M + N). Thus we have eliminated the need for the original covariance
matrices. Note this also reduces the size of the central matrix on the left hand
side of Equation 8. This is of crucial computational importance because it
makes the eigenproblem tractable for problems in which n is very large, such
as when each datum is an image.
2.2 Subtraction
The algorithm for subtraction is very similar to that for addition. First compute the number of data, and their mean:
N(X) = N(Z) N(Y )
(X) = (N(Z)(Z) N(Y )(Y ))/N(X)
6
(18)
(19)
In this case U(Z) is a sufficient spanning set to rotate. To compute the rotation
we use the eigendecomposition:
N(Y )
N(Y ) T
N(Z)
T
(Z)rr
Grp (Y )pp GTrp
gr g = Rrr (X)rr Rrr
(20)
N(X)
N(X)
N(Z) r
where Grp = U(Z)Tnr U(X)nq and gr = U(Z)Tnr (Y X). The eigenvalues
we seek are the p non-zero elements on the diagonal of (X)rr . Thus we can
permute Rrr and (X)rr , and write without loss of generality:
T
Rrr (X)rr Rrr
= [Rrp Rrt ]
(X)pp 0pt
T
[Rrp Rrt ]
0tp 0tt
T
= Rrp (X)pp Rrp
(21)
where p = r q. Hence we need only identify the eigenvectors in Rrr with

non-zero eigenvalues, and compute the U(X)np as:
U(X)np = U(Z)nr Rrp
(22)
Splitting must always involve the solution an eigenproblem of size r.
2.3 Some comments on the solutions

Previously we presented a method for adding a single point, x, to an eigenspace
[10]. It can be shown [9] that the addition of exactly one new datum is a special case of the above addition, with (x, 0, 0, 1) as either operand. In terms of
its outcome the above addition is both commutative and associative (provided
that in practice we allow for numerical errors, especially in association). The
null eigenspace (0, 0, 0, 0) is an additive identity. The addition of an eigenspace
to itself yields an eigenspace which is identical in all respects except the number of points (which doubles). As N(X) so the effect of adding (Y )
becomes negligible, and vice-verse. As both N(X) and N(Y ) tend to infinity
together, so the result tends to a stable state.
The time complexity for addition will shadow that used in computing the
eigenvalue decomposition. Our experiments [9] demonstrate that the time
taken is O(s3), where s is the size of the eigenproblem to be solved (we
used a proprietary eigensolver). We also found that the time to compute
two eigenspaces ab initio and add them is about that of computing a large
eigenspace using all the original data. However, it is much more efficient to
7
add a pair of existing eigenspaces than to compute their sum ab initio. Similar remarks apply to splitting: removing a few data points is a comparatively
efficient operation. The conclusion we reach is that addition and subtraction
of eigenspaces is no less efficient than batch methods, and in most cases is
performed much more efficiently.
We have compared the angular deviation of eigenvectors, change in eigenvalues, accuracy of data representation in the least-squares sense, and classification performance [9]. The incremental methods for addition generally compare
very well with batch methods, with discrepancies being a minimum when the
two eigenspaces added are of about the same size; the exception is the discrepancy in eigenvalues, which shows a maximum of about one part in 105 at
that point. Reasons for this behaviour are the subject of future work we
have not yet undertaken a rigorous analysis of errors.
The subtraction operator tends to instability as the number of points being
removed rises, since in this case N(X) 0, hence 1/N(X) . In the limit
of all points being removed N(X) = 0, and an exception must be coded to
return a null eigenspace. Unfortunately, we have found that prior scaling by
N(X) to be ineffective and have concluded that, in practice, subtraction is
best used to remove a small fraction of the data points.
Applications
An obvious application of our methods is to build an eigenspace from many

images too many to all at once fit into memory. We ran a simulation of
this by building two eigenmodels: one using batch methods and another using
our incremental methods. We were then able to compare the two models. The
eigenspaces themselves turn out to be very similar, although differences between batch and incremental eigenspaces are greater in cases where eigenspaces
are subtracted. Performance results bear out intuition: those images used to
make the eigenspace had a much lower residue error than those not so used.
As more images were added into the construction the maximum residue error
for each image rose but never so high as to reach the minimum residue error
for images not used in eigenspace construction. Classification results follow a
similar trend: each image is better classified by an eigenspace that uses all
images.
We now present two more substantial applications of our methods. These are
of a generic nature. The intention is to furnish the reader with a practically
useful appreciation of the characteristics of our methods, and avoid the specific
problems of any particular application.
8
3.1 Building an accurate eigenspace model

Here we consider an image database application. The scenario is that of a
university wishing to efficiently store photographs of its thousands of students
for use in a security application of some kind, such as access to a laboratory.
The students are to be identified from their facial appearance. This problem
is well researched, and we do not claim to make a contribution, rather we aim
to show how our methods might be used in a support role. In particular, we
consider the case in which the database of images changes, as old students
leave and new ones arrive.
We proceed in a very simple way: we construct an eigenmodel of all those
people who are to be recognised, and rely on the fact that eigenmodels do not
generalise well to distinguish between those people in the set, and those not
in the set. To allow for changes in pose, expression, and so on, we use several
images of each individual.
Conventional batch methods cannot be used to construct an eigenmodel because there are too many images to fit into memory at once, so incremental
methods are a pre-requisite to our approach. Given that the database is subject to change we could reconstruct an eigenmodel at each change, but we will
use our incremental methods to effect the changes more efficiently; for which
subtraction is required.
We used the Olivetti database of 400 faces 1 as our group of students. We
constructed an eigenmodel from a selection of 20 people, there being 10 photographs for each person. Each person in the entire database was then given a
weight of evidence between 0 (not in the database) and 1 (in the database).
To compute the weight we computed the maximum Mahalanobis distance
(using Moghaddam and Pentlands method [12]) of any photograph used in
constructing the database. Each photograph was then classified as in if its
Mahalanobis distance was less than this. Since each person has 10 photographs
associated with them, we can then compute a weight for each person as the
fraction of their photographs classified as in.
Figure 2 shows the weight of evidence measure for the second year our
hypothesised database has been running. The scenario is that in year one, only
persons 0 to 21 inclusive were in, while in year two only persons 1 to 22
inclusive were in. The leftmost plot shows the measure for the images against
a batch model. That on the right shows the same measure for the same images,
but for a model incrementally computed from year one by first including any
new students (person 22), and then removing old students (person 0). (The
ordering used to make sure the fraction of images removed was minimised.)
1
http://www.cam-orl.co.uk/facedatabase.html
0.9
0.9
0.8
weight for person

person in database
0.7
Weight of evidence: 0 is not in, 1 is in
Weight of evidence: 0 is not in, 1 is in
0.8
0.6
0.5
0.4
0.3
0.6
0.5
0.4
0.3
0.2
0.2
0.1
0.1
10
15
20
Person index
25
30
35
weight for person

person in database
0.7
40
10
15
20
Person index
25
30
35
40
Fig. 2. Weight of evidence measures: year 2 batch (left), and year 2 incremental
(right).
We notice that both models produce some ambiguous cases, with weights
between 0 and 1, and that the incrementally computed eigenspace gives rise
to more of these cases than the eigenmodel computed via batch methods. This
result is in line with our earlier comments regarding the relative inaccuracy
of subtraction. Even so, only those people in the database score 1, while
everyone outside scored less than 1 and hence classification is still possible.
Given our observations, above, regarding previous measures when subtracting
eigenspaces, we conclude that additive incremental eigenanalysis is safe for
classification metrics, but that subtractive incremental eigenanalysis needs a
greater degree of caution.
3.2 Dynamic Gaussian mixture models
We are interested in using our methods to construct dynamic GMMs. Gaussian

mixture models are useful in computer vision contexts [5]. Our approach treats
a GMM as a hierarchy of eigenspaces, which is a mechanism for improving the
specificity of the data description [11]. To construct a hierarchy we first make
an eigenmodel, then project all data into it to reduce dimensionality, next
construct a GMM using the projected data, and then represent each mixture
as an eigenmodel. Thus, each Gaussian in the mixture can be thought of as a
hyperellipse, and each may have a different dimensions. The problem here is
to merge two such GMMs.
As an example, we used photographs of two distinct toys, each photographed
at 5 degree angles on a turntable. Hence we had 144 photographs. Examples
of these photographs can be seen in Figure 3. The photographs for each toy
were input separately, and a hierarchy of eigenmodels constructed as described
above; we used eighteen Gaussians in each mixture model, on the grounds that
this would very probably produce too many Gaussians a number we would
later improve by merging.
10
Fig. 3. Sample images of each toy used as source data in our dynamic GMM application.
Thus, including the top-level eigenspace, each set of toy photographs was represented with nineteen eigenspace models. To merge the GMMs for the pair
of toys we first added added together the two top-most eigenspaces to make
a complete eigenspace for all 144 photographs. Next we transformed each of
the GMM clusters into this space, thus bringing each of the thirty-six GMMs
(eighteen from each individual hierarchy) into the same (large) eigenspace
covering the ensemble of data. We then merged eigenspaces (Gaussian components), using a very simple criterion to merge based on reducing volume
of hyperellipses, which is explained below. Hence, we were able to reduce the
total number of Gaussians to 22 in the mixture. These clusters tend to model
different parts of the cylindrical trajectories of the original data projected into
the large eigenspace. Examples of cluster centres are shown in Figure 4: the
two models can be clearly seen in different positions. In addition, we found a
few clusters occupying the space in between the two toys an example of
which is seen in Figure 4.
Fig. 4. Dynamic Gaussian Mixture Models, showing 5 examples of the 22 cluster

centres. These are arranged to show clusters for each toy (top row), and the clusters
between them (bottom).
As mentioned above, we used a simple method based on volume to decide

whether two eigenspaces should be merged. The procedure was as follows.
First compute the volume of each of the eigenmodels, using the hyperellipse
at one Mahalanobis distance. The volume of a hyperellipse with semi-axes
A (each element the square root of an eigenvalue), of dimension M, and at
characteristic radius s (square root of the Mahalanobis distance) is
11
sM |A| M/2
( M2 + 1)
(23)
We permanently merged a pair of eigenmodels in the GMM if the sum of their

individual volumes was greater than their volume when merged. This measure
suffers from problems of dimension: we should not compare the volume of a pdimensional hyperellipse with that of a q-dimensional hyperellipse. A solution
is to use a characteristic length in place of volume, which for a p-dimensional
hyperellipse of volume v is v 1/p .
Of course, the utility and properties of the final GMM is fully in line with any
produced by conventional means, and hence can be used in any application
that a conventional GMM is used. We conclude from these experiments that
dynamic GMMs are a feasible proposition using our methods.
Comments on SVD approaches
This paper has focussed on block updating and block downdating of eigenspace
models, based on eigenvalue decomposition (EVD). However EVD can suffer from conditioning problems (the condition number being the ratio of the
smallest to largest eigenvalue). Singular value decomposition (SVD) tends to
be more stable (and hence accurate) because singular values are proportional
to the square root of eigenvalues, which mitigates conditioning problems. Thus
we have also investigated block updating and block downdating based on singular value decomposition, which we briefly outline next.
We are given a set of N observations, each an n-dimensional column vector.
(Note that sets here can contain repeated elements.) These observations can
be represented by an (nN) matrix, X. The mean of these observationsqis the
vector . The SVD for the origin-centred data set is X 1 = (U V T )/ (N),
where 1 is a row vector of N 1s. The left singular vectors are columns of U, and
are identical to the eigenvectors in the EVD of XX T . The right singular vectors
V are related to the coordinates of the data when transformed
into the U basis
q
(which is the eigenspace): V = ((X 1)T U1 ) (N). The singular values,
, give the length of semi-axes along each U vector and specify the size of a
hyperellipse at unit Mahalanobis distance. The singular values are related to
the eigenvalues in the EVD of XX T , 1/2 . As for eigenmodels, the system
can be deflated by discarding left and right singular vectors
that are associated
q
T
with small singular values: X 1 (Unp pp VN p )/ (N). So, for data X we
specify an SVD model as (X) = ((X), U(X), (X), V (X), N(X)).
The block updating problem for data sets X and Y is to compute the SVD
for their union, conveniently written Z = [X, Y ], given only their SVD models
12
(X) and (Y ). This is done in three stages. First the number N(Z) and mean
(Z) are computed as for EVD updates. An orthonormal basis set which
spans any subspace of U(Y ) not in U(X) is also computed in a way similar
to that for EVD, but need not include the difference between the means
that is accounted for elsewhere in the SVD formulation. Second, the following
singular value decomposition is made:
q
U V T =
N(X)(X)V (X)
q
q
N(Y )U(X) U(Y )(Y )V (Y )
N(Y ) 0 U(Y )U(Y )(Y )V (Y )T
T
T
U(X) ((X) (Z))1N (X) U(X) ((Y ) (Z))1N (Y )
T ((X) (Z))1N (X)
0 ((Y ) (Z))1N (Y )
(24)
Here the second term accounts for the difference in means. Finally, the left
and right singular vectors, and singular values are computed, and deflated as
desired.
U(Z) = U
(25)
S(Z) = S/ N(Z)
V (Z) = V
(26)
(27)
We note that the right singular values are given directly, a shift of mean is
accounted for, and the problem is of size (p+q)(N(X)+N(Y )). Contrasting
our solution with others [3], they add a single new point at time, do not shift
the mean, and require post manipulation multiplication of right singular values
without offering any efficiency gain in terms of problem size.
Downdate of SVD models means removing points in data set Z which are also
in data set Y ; in set-theoretic terms we assume Y is a subset of Z and want to
compute the SVD for X = Z \ Y . This is not straightforward and difficulties
arise in two areas, even if we neglect a change in mean. The first difficulty
comes from the (simplest) form of the problem which is [ABC T , DEF T ] =
GHJ T , where X = ABC T , Y = DEF T , and Z = GHJ T . We must obtain A,
B, and C. By computing the inner product of both sides we obtain AB 2 AT +
DE 2 D T = GH 2 GT , which gives us an EVD problem from which we can
compute A and B. Alternatively, we can use the relationship between EVD
and SVD to compute A, B, and the mean shift using the EVD downdating
methods described above. However, we note that either way SVD downdate
cannot be directly achieved.
The second difficulty arises when we note that the ordering of right singular
vectors depends upon the ordering of data points in the matrix being decomposed. However, the left singular vectors and singular values are invariant to
13
permutation of the data. To see this we suppose P is a permutation matrix (obtained by randomly permuting the identity matrix, so that P P T = P T P = I),
and note that given Z = GHJ T , then ZP = GHJ T P = GH(P T J)T . Therefore, in order to compute the right singular vectors C while downdating, we
must have access to some matrix P which picks out data elements in Z (or,
equivalently, corresponding elements in J). Unfortunately no such information
exists within the SVD model, and consequently computing C in a closed-form
manner seems impossible. The only solution seems to be a resort to search
using data elements in J and F (for these specify data points in Z and Y
respectively). If search is the only solution, then we may simply downdate Z
by building up X incrementally as elements in Z \ Y are found which is
unsatisfactory in our opinion.
Thus we conclude that updating SVD models is possible, at the expense of
keeping right singular values, and that downdate of SVD models is not possible in closed form. However, experimental evidence [10] suggests that SVD
updates are likely to be more accurate than EVD updates.
Conclusion
We have presented methods for adding and subtracting eigenspaces. We have

discussed the form of our solutions, and shown that previous work is a special case of this work. Our contribution is to track the mean in a principled
way, which makes our contribution novel. This is essential in classification
applications, which makes our contribution important.
Having experimentally compared eigenspaces, considered performance metrics
of our algorithms, and having experimented with several more applications
we have concluded that the addition of eigenspaces is stable and reliable. We
advise that our methods be used carefully any statistical method may be
misapplied. Especial care should be taken when subtracting eigenspaces: the
way in which the results are to be used impacts on efficacy.
We should point out several omissions from this work. We have not performed
any rigorous error analysis and hence any explanations we have for the behaviour of our algorithms (in terms of approximating the batch version) are
anecdotal in character. We have not fully worked through any particular application, and so can make general recommendations only. We have mentioned
SVD techniques, and argued that SVD downdate is not possible in closed
form, but have not experimentally compared block update of SVD with block
update of EVD. We have conducted experiments which compare EVD and
SVD update when adding a single new data point, and found SVD to be more
accurate; see our earlier work [10]. This is an expected result, given that EVD
14
squares the condition number. We have also omitted comparisons with other
incremental methods, because they deal with adding one new data point (but,
again see our earlier work [10]). The important conclusion from that work is
that updating the mean is crucial for classification results [10].
We would expect our methods to find much wider applicability than those we
have already mentioned in this paper: updating image motion parameters [4],
and selecting salient views [3] are two applications that exist already for incremental methods. We have experimented with image segmentation, building
models of three-dimensional blood vessels, and texture classification. We believe that dynamic Gaussian mixture models provide a very interesting future
path for they enable useful representations [5,11] and all their attendant
properties to be brought into a dynamic framework.
References
[1] James R. Bunch and Christopher P. Nielsen. Updating the singular value
decomposition. Numerische Mathematik, 31:111129, 1978.
[2] James R. Bunch, Christopher P. Nielsen, and Danny C. Sorenson. Rank-one
modification of the symmetric eigenproblem. Numerische Mathematik, 31:31
48, 1978.
[3] S. Chandrasekaran, B.S. Manjunath, Y.F. Wang, J. Winkler, and H.Zhang. An
eigenspace update algorithm for image analysis. Graphical Models and Image
Processing, 59(5):321332, September 1997.
[4] S. Chaudhuri, S. Sharma, and S. Chatterjee. Recursive esitmation of motion
parameters. Computer Vision and Image Understanding, 64(3):434442,
November 1996.
[5] T.F. Cootes and C.J. Taylor. A mixture model for representing shape variations.
In Proc. British Machine Vision Conference, pages 110119, 1997.
[6] T.F. Cootes, C.J. Taylor, D.H. Cooper, and J. Graham. Training models of
shape from sets of examples. In Proc. British Machine Vision Conference,
pages 918, 1992.
[7] Ronald D. DeGroat and Richard Roberts. Efficient, numerically stablized rankone eigenstructure updating. IEEE Transactions on acoustics, speech, and
signal processing, 38(2):301316, February 1990.
[8] Gene H. Golub and Charles F. Van Loan. Matrix computations. Johns Hopkins,
1983.
[9] Peter Hall, David Marshall, and Ralph Martin. Merging and splitting
eigenspaces. IEEE Transactions on Pattern Analysis and Machine Intelligence.
(to appear)
15
[10] Peter Hall, David Marshall, and Ralph Martin. Incrementally computing
eigenspace models. In Proc. British Machine Vision Conference, pages 286
295, Southampton, 1998.
[11] Tony Heap and David Hogg. Improving specificity in pdms using a heirarchical
approach. In Pooc. British Machine Conference, pages 8089, 1997.
[12] Baback Moghaddam and Alex Pentland. Probabilistic visual learning for
object representation. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 19(7):696710, July 1997.
[13] H. Murakami and B.V.K.V Kumar. Efficient calculation of primary images from
a set of images. IEEE Pattern Analysis and Machine Intelligence, 4(5):511515,
September 1982.
16

On Adding and Subtracting Eigenspaces With Evd and SVD: Peter Hall David Marshall Ralph Martin

Uploaded by

On Adding and Subtracting Eigenspaces With Evd and SVD: Peter Hall David Marshall Ralph Martin

Uploaded by

On adding and subtracting eigenspaces with

EVD and SVD

of Mathematical Sciences, Bath University

of Computer Science, Cardiff University

without recomputing the eigenmodel ab initio.

Adding and subtracting eigenspaces

We now state the problems which are our subject.

in which: (X) is their mean; U(X)np is a collection of p eigenvectors, each

and in general q 6= p even if Y = X: deflation may occur in different ways.

which is to remove the observations in Y from the eigenmodel in Z. As in the

Subspaces (y1,y2) and (v1,v2) are embedded in (e1,e2,e3).

Fig. 1. An illustration of relationships between subspaces embedded in a larger

Putting (temporarily) to one side issues relating to changes in dimension,

N(X) (X)pp 0pt

N(Y ) Gpq (Y )qq GTpq Gpq (Y )qq Ttq

The eigenmodel can then be deflated, if desired, to dimension r s.

where p = r q. Hence we need only identify the eigenvectors in Rrr with

Splitting must always involve the solution an eigenproblem of size r.

2.3 Some comments on the solutions

An obvious application of our methods is to build an eigenspace from many

3.1 Building an accurate eigenspace model

weight for person

Weight of evidence: 0 is not in, 1 is in

Weight of evidence: 0 is not in, 1 is in

weight for person

3.2 Dynamic Gaussian mixture models

We are interested in using our methods to construct dynamic GMMs. Gaussian

Fig. 4. Dynamic Gaussian Mixture Models, showing 5 examples of the 22 cluster

As mentioned above, we used a simple method based on volume to decide

We permanently merged a pair of eigenmodels in the GMM if the sum of their

Comments on SVD approaches

N(Y )U(X) U(Y )(Y )V (Y )

N(Y ) 0 U(Y )U(Y )(Y )V (Y )T

T ((X) (Z))1N (X)

We have presented methods for adding and subtracting eigenspaces. We have

You might also like