PROST: Parallel Robust Online Simple Tracking

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

PROST: Parallel Robust Online Simple Tracking

Jakob Santner Christian Leistner Amir Saffari Thomas Pock Horst Bischof
Institute for Computer Graphics and Vision, Graz University of Technology
{santner,leistner,saffari,pock,bischof}@icg.tugraz.at

Abstract have become very popular. These tracking-by-detection


systems usually perform one-shot learning of object detec-
Tracking-by-detection is increasingly popular in order to tors for the target object at the first frame. Surrounding
tackle the visual tracking problem. Existing adaptive meth- patches are taken as negative samples [2]. These systems
ods suffer from the drifting problem, since they rely on self- are fast and yield good performance since the classification
updates of an on-line learning method. In contrast to pre- task is simple: Discriminate the target object from its sur-
vious work that tackled this problem by employing semi- rounding background. In order to allow for fast appearance
supervised or multiple-instance learning, we show that changes, recent works use online learners that perform up-
augmenting an on-line learning method with complemen- dating on the target object based on the tracking result (self-
tary tracking approaches can lead to more stable results. updating) [9].
In particular, we use a simple template model as a non- The problem with these approaches is, that the self-
adaptive and thus stable component, a novel optical-flow- updating process may easily cause drifting in case of wrong
based mean-shift tracker as highly adaptive element and an updates. Even worse, the tracking-by-detection approach
on-line random forest as moderately adaptive appearance- suffers also from the fact that usually online counterparts
based learner. We combine these three trackers in a cas- of supervised learning algorithms are used, which are not
cade. All of our components run on GPUs or similar multi- designed for handling ambiguity of class labels: Despite
core systems, which allows for real-time performance. We the fact that boosting is known to by highly susceptible to
show the superiority of our system over current state-of- label noise, it is widely used in self-learning based track-
the-art tracking methods in several experiments on publicly ing methods. This severe problem of adaptive tracking-by-
available data. detection methods can also be explained by the stability-
plasticity dilemma [11]: If the classifier is trained only with
the first frame, it is less error-prone to occlusions and can
1. Introduction virtually not drift. However, it is not adaptive at all and can-
not follow an object undergoing appearance and viewpoint
Visual object tracking is one of the cardinal problems changes. On the other hand, online classifiers that perform
of computer vision. Although tracking finds many practi- self-learning on their own confidence maps are highly adap-
cal applications ranging from robotics, surveillance, aug- tive but easily drift in the case of wrong updates.
mented reality to human-computer interaction, the state-of-
Grabner et al. [10] alleviated this dilemma by formu-
the-art is still far from achieving results comparable to hu-
lating tracking-by-detection as a one-shot semi-supervised
man performance. Trackers have to deal with several diffi-
learning problem using online boosting. Supervised up-
culties such as background clutter, fast appearance and illu-
dates are only performed at the first frame and all subse-
mination changes and occlusions. Many different tracking
quent patches are exploited as unlabeled data with the help
methods have been proposed, from global template-based
of a non-adaptive prior classifier. Although this method has
trackers [2], shape-based methods, probabilistic models us-
shown to be less susceptible to drifting and simultaneously
ing mean-shift [7] or particle filtering [13] to local key-point
more adaptive than an offline learner, it turned out that such
based trackers [14] or flow-based trackers [19].
an approach is still not adaptive enough to handle fast ap-
Recently, tracking methods based on detection systems
pearance changes [3].
This work was supported by the Austrian Research Promotion Agency
Another cause of drifting for online learners that perform
(FFG) within the projects VM-GPU (813396) and Outlier (820923) as well
as the Austrian Science Fund (FWF) under the doctoral program Conflu-
self-updating is label jitter. The problem of label jitter arises
ence of Vision and Graphics W1209. We also greatly acknowledge Nvidia if the bounding boxes of an object are not perfectly aligned
for their valuable support. with the target, although it is detected correctly. If label jit-
ter occurs repeatedly over a tracking sequence, the tracker
will most likely start to loose the target object. For offline
detectors, Viola et al. [22] showed that multiple-instance
learning can easily handle such ambiguities of bounding
boxes. Therefore, Babenko et al. [3] recently used online
multiple instance learning to reduce the effect of label jit-
ter during tracking. In their work, the main idea is to take Figure 1. The response characteristics of an online tracker can be
patches lying most likely on the target object as instance for defined as the number of frames it needs to adapt to appearance
a positive bag and instances further away as negatives. This changes. Our complementary algorithms have been chosen from
approach currently yields the best tracking-by-detection re- the opposite ends of this spectrum.
sults and can be considered as the state-of-the-art.
In this paper, we revisit the stability-plasticity dilemma
[11] of online tracking-by-detection methods. In contrast to
In Section 2 and 3, we present our approach and give
recent works (e.g. [10, 3]), we do not tackle the problem by
a detailed overview of its individual parts. In Section 4,
applying another learning method, but by combining several
we compare our approach to other state-of-the-art methods
complimentary trackers operating at different timescales.
on benchmark tracking data sets and on own recorded se-
Recently, Stenger et al. [21] investigated in different com-
quences. Finally, in Section 5, we give some conclusions
binations of tracking methods. Given a particular tracking
and ideas for future work.
scenario, they tried to learn which methods are useful and
how they can be combined to yield good results. Our work
differs from their approach such that our method does not 2. Tracking Components
require offline pre-training of possible combinations. Our goal is to allow an online-tracker to be adaptive to
We show that augmenting a simple online learner with fast appearance changes without being too prone to drifting.
its two extreme counterparts in terms of adaptivity can lead In other words, we would like to increase its stability and
to much better results. In particular, our approach is based plasticity at the same time. Therefore, we add complemen-
on the fact, that different tracking approaches lie on differ- tary algorithms with different adaptivity rate , where
ent scales of the adaptivity spectrum (Figure 1): On the one denotes the number of frames a tracker needs to fully adapt
very end are trackers, that are totally non-adaptive such as to appearance changes.
template-based trackers. On the other end are highly adap- We make the following observations: (i) Object detec-
tive methods such as optical-flow-based trackers. Tracking- tion algorithms do not adapt to appearance changes, yield-
by-detection systems are somewhere in between, depending ing an infinite adaptivity rate = . (ii) Frame-to-frame
on their learning method and adaptivity rate. trackers adapt to changing object appearance at every frame
We propose a system called PROST1 (Parallel Robust thus having = 1. (iii) Online trackers usually can be ad-
Online Simple Tracking), consisting of three different track- justed to have a certain adaptivity rate 1 .
ers that are able to cover the entire adaptivity spectrum. We In order to study the most complementary algorithms for
use basic normalized cross correlation template matching an online system, we selected one tracking method from
to cover the non-adaptive end. Additionally, we introduce a each far end of the adaptivity graph (see Figure 1): An
novel highly adaptive optical-flow-based mean-shift tracker. optical flow based tracker and a simple template matching
In between, our system consists of an online random forest method. All parts are described in more detail below. Note
[17] as adaptive appearance-based classifier. In contrast to that there are more sophisticated methods performing bet-
previous methods, our system is especially designed to al- ter than the chosen ones. However, the goal is to demon-
leviate the drifting problem of appearance based trackers strate that an online tracker can be substantially improved
while still being highly adaptive. The core parts have been by smartly combining it with even simple methods.
selected to be easily parallelized and are implemented on
the GPU in order to allow for real-time performance. 2.1. Template Correlation
The adaptivity rate of online trackers can be adjusted by
The static part in our system is based on normalized
parameter tuning in order to fit to a specific dataset. A par-
cross-correlation (NCC). We simply use the object which
ticular advantage of our system is, that it is able to perform
is marked in the first frame by a rectangle as template and
well on unseen sequences without the need of being ad-
match it in every forthcoming frame. The tracking rectan-
justed beforehand. Throughout all experiments in this pa-
gle is moved to the peak of the correlation confidence map.
per, no parameter adjustment has been done - all results use
NCC does not adapt to any changes but brightness, which
the identical algorithm and settings.
renders it useless when the object appearance changes per-
1 prost is the german word for cheers manently.
2.2. Mean Shift Optical Flow
The estimation of optical flow is one of the essential
problems in low-level computer vision. It basically deals
with computing the motion between consecutive frames of
a sequence. In [12], Horn and Schunk estimated optical
flow by minimizing an energy functional of the form (a) (b)
nZ
2 2
min |u1 | + |u2 | d+
u
Z o
2
(I1 (x + u(x)) I0 (x)) d (1)

(c) (d)
with u = (uR1 , u2 )T , consisting of two terms: A regular- Figure 2. Optical flow estimation using the algorithm of Horn and
2 2
ization term |u1 | + |u2 | smoothing the flow field Schunk [12] (a,b) and Werlberger et al. [23] (c,d). These flow-
R 2 fields are printed in color-coded representation, where hue encodes
and a data term (I1 (x + u(x)) I0 (x)) . is a param-
eter steering between data term and regularization term, I0 direction and intensity encodes magnitude of the flow vectors. As
and I1 represent the sequential frames and u is the two- can easily be seen, (c) and (d) are less noisy than (a) and (b) while
preserving sharp motion boundaries.
dimensional flow field. This model uses quadratic penal-
izers and therefore is not suited for estimating flow fields
with sharp discontinuities. Using an L1 norm on the data
term and Total Variation (TV) regularization [20] leads to This mean shift optical flow tracker (FLOW) is fast and ac-
the following energy: curately adapts to appearance changes. However, it may
loose the object in presence of large occlusions, fast illu-
nZ
min |u1 | + |u2 | d+ mination changes and vast 3D-motion such as out-of-plane
u rotations. Furthermore, once it fails, it is not able to recover.
Z o
|I1 (x + u(x)) I0 (x)| d (2) 2.3. Online Random Forest

Zach et al. [24] achieved realtime performance by minimiz- Complementary to FLOW and NCC, we employ an
ing this energy on the GPU. The TV regularization favors adaptive appearance-based tracker based on online ran-
sharp discontinuities, but also leads to a so-called staircase dom forests (ORF). Random Forests [6] are ensembles of
effect, where the flow field exhibits piecewise constant lev- N recursively trained decision trees in form of f (x) :
els. In recent work, Werlberger et al. [23] replaced the TV X Y. For a forest F = {f1 , , fN }, a decision
norm by a Huber norm to tackle this problem: Below a cer- is made by simply taking the maximum over all individ-
tain threshold the penalty is quadratic, leading to smooth ual probabilities of the trees for a class k with C(x) =
PN
flow fields for small displacements. Above that threshold, arg max N1 n=1 pn (k|x), where pn (k|x) is the estimated
kY
the penalty becomes linear allowing for sharp discontinu- density of class labels of the leaf of the nth tree. In order to
ities. With the additional incorporation of a diffusion tensor decrease the correlation of the trees, each tree is provided
for anisotropic regularization, their method (Huber - L1 ) is with a slightly different subset of training data by sub sam-
currently one of the most accurate optical flow algorithms pling with replacement from the entire training set, a.k.a
according to the Middlebury evaluation website [4]. Figure bagging. During training, each split node randomly selects
2 shows the difference between the method of Werlberger binary tests from the feature vector and selects the best ac-
et al. and the algorithm of Horn and Schunk. cording to an impurity measurement. The information gain
In order to use the dense flow field as input to a tracker, after node splitting is usually measured with
we estimate the objects translation from the flow vec-
tors. We use a mean-shift procedure in the two-dimensional |Il | |Ir |
translation space, taking into account every flow-vector H = H(Il ) H(Ir ), (3)
|Il | + |Ir | |Il | + |Ir |
within our tracking rectangle. In contrast to averaging the
displacement vectors, mean shift allows to handle occlu- where Il and Ir are the
PKleft and right subsets of the train-
sions more robustly. For simplicity, we estimate only trans- ing data. H(I) = i=1 pji log(pji ) is the entropy of the
lation of our object throughout this work; however, note classes in the node and pji is the label density of class i in
that other motion models incorporating e.g., rotation, scale, node j. The recursive training continues until a maximum
affine motion, etc. could be estimated from the flow field. depth is reached or no further information gain is possible.
Random Forests have several advantages that make them
particularly interesting for computer vision applications,
i.e., they are fast in both training and evaluation and yield
state-of-the-art classification results while being less noise-
sensitive compared to other classifiers (e.g., boosting). Ad-
ditionally, RFs are inherently multi-class and allow, due to
their parallel structure, for multi-core and GPU [18] imple-
mentations.
Recently, Saffari et al. [17] proposed an online version Figure 3. Highly-flexible parts of our system take care of tracking,
of RFs which allows to use them as online classifiers in while the conservative parts correct the flexible ones when they
tracking-by-detection systems. Since recursive training of have drifted away.
decision trees is hard to do in online learning, they propose a
tree-growing procedure similar to evolving-trees [15]. The
algorithm starts with trees consisting only of root nodes and
randomly selected node tests fi and thresholds i . Each
4. Experiments
node estimatesPKan impurity measure based on the Gini in- During the experiments, we compare our algorithm
dex (Gi = i=1 pji (1 pji )) online, where pji is the label to current state of the art methods on publicly available
density of class i in node K. Then, after each online update datasets. We also created several new challenging video se-
the possible information gain G during a potential node quences, which are available on our website together with
split is measured. If G exceeds a given threshold , the ground truth annotation and results 2 . The major conclusion
node becomes a split node, i.e., is not updated any more and from the experiments is that our algorithm is more adaptive
generates two child leaf nodes. The growing proceeds until and stable at the same time compared to other tracking-by-
a maximum depth is reached. Even when the tree has grown detection systems. Please note that we always use the same
to its full size, all leaf nodes are further updated online. parameters throughout the experiments in this section.
The method is simple to implement and has shown to
converge fast to its offline counterpart. Additionally, Saf- 4.1. Implementation
fari et al. [17] showed that the classifier is faster and more
For FLOW, we employ the GPU-based implementation
noise-robust compared to boosting, which makes it an ideal
of Werlberger et al. [23], which is available online. NCC is
candidate for our tracking system.
based on the cvMatchTemplate() function implemented in
the OpenCV library, ORF is based on the code of Saffari et
al. [17], which is also publicly available. We achieve real-
3. Tracker Combination time performance with our system, however, NCC and es-
pecially ORF could benefit largely from being implemented
A tracker has to incorporate two conflicting properties: It on the GPU.
has to (i) adapt to fast object appearance changes while (ii)
being able to recover in case of drifting. In other words, we 4.2. Quality Score
need an highly adaptive tracker that is corrected by system
components that are more inertial. Therefore, we combine To evaluate the performance of their tracker, Babenko et
the three different tracking approaches discussed before in al. [3] use a score representing the mean center location er-
a simple fall-back cascade (see also Figure 3): In order to ror in pixels. This is not a good choice, as the ground truth
allow for fast changes, FLOW forms the main tracker. This rectangles are fixed in size and axis-aligned whereas the se-
implies that FLOW can also easily lose the target, hence, it quences exhibit scale and rotational changes. Furthermore,
can be overruled by ORF. NCC is employed to prevent ORF their score does not take into account the different size of
from making too many wrong updates. Our cascade can be the objects in different sequences.
summarized with the following simple rules: To overcome these problems, we additionally use a score
based on the PASCAL challenge [8] object detection score:
1. FLOW is overruled by ORF if they are (i) not Given the detected bounding box ROID and the ground
overlapping and (ii) ORF has a confidence truth bounding box ROIGT , the overlap score evaluates as
above a given threshold.
area(ROID ROIGT )
score = .
2. ORF is updated only if it overlaps with NCC area(ROID ROIGT )
or FLOW.
2 www.gpu4vision.org
By interpreting a frame as true positive when this score ex- With these experiments we show, that the different algo-
ceeds 0.5, we can give a percentage of correctly tracked rithms can complement one another. FLOW is a good high
frames for each sequence. dynamic tracker, but needs correction from time to time to
get rid of cumulating errors. ORF could do that, but needs
4.3. Sequences a supervisor preventing it from doing too many wrong up-
dates. NCC is not suited to track on a per-frame basis but
Throughout the experiments, we use ten challenging se-
gives strong cues when the object reappears similarly to the
quences (Table 1) featuring e.g. moving cameras, clut-
initial template.
tered background, occlusions, 3-D motion or illumination
changes. The video data, ground truth and results of other
methods for the first six sequences have been taken from
Babenko et al. [3]. The other four videos (see Figure 7)
have been created and annotated by ourselves.

Sequence Frames Main Challenges


Girl [5] 502 3D-motion, moving camera
David [16] 462 moving camera, varying illumination
Sylvester[16] 1344 3D-motion, varying illumination
Faceocc1 [1] 886 moving camera, occlusions
Faceocc2 [3] 812 occlusions, heavy appearance change
Tiger1 [3] 354 fast motion, heavy appearance change
Board 698 3D-motion
Box 1161 fast 3D-motion, occlusions
Lemming 1336 heavy scale changes, motion blur
Liquor 1741 motion blur, occlusions
Table 1. The tracking sequences used in our experiments. The last
four videos are available together with ground-truth annotations Figure 4. Separate evaluation of the building blocks of our system.
on our website.

4.4. Performance of the building blocks


In the first experiment, we investigate the behavior of 4.5. Benchmarks
our three building blocks separately on two sequences,
Sylvester and David. The average pixel error is given in 4.5.1 Standard Datasets
Figure 4.
In this experiment, we would like to benchmark our ap-
NCC works well when the appearance of the object is proach on the following sequences: Girl, David Indoor,
close to the learned template. In the sequence David, Sylvester, Occluded Face, Occluded Face 2 and Tiger 1.
this holds for the first 100 frames, then the object has Recently, Babenko et al. [3] showed superior results com-
changed such that NCC is not able to distinguish it paring their method (MILTrack) to AdaBoost of Grabner
from background. For Sylvester, the NCC works also et al. [9] and FragTrack of Adam et al. [1]. We bench-
well on the initial frames and, although loosing the mark against their results, details on the parametrization of
object later, it is able to find it again several times the different algorithms are given in their paper. For their
throughout the sequence. own method, they provide results for five different runs:
As their algorithm depends on random features, the perfor-
ORF clearly indicates the fundamental problem of on-
mance varies between subsequent runs. This difference is
line tracking algorithms on these two sequences: With
most of the times substantial, however, we give our own al-
identical parameters, it is stable enough for Sylvester
gorithm a handicap by comparing our results to their best
but looses David completely after 150 frames.
run in each sequence according to the PASCAL score.
FLOW tracks the object correctly for the first 150 Table 2 and Figure 5 depict the results based on the mean
frames on David and 400 frames of Sylvester, but then pixel error: In 3 of 6 sequences, our method yields the best
starts to drift away, accumulating errors from frame to scores, in the other sequences it is the second best. Table
frame. In David, it gets back to the object by chance 3 depicts the percentage of frames tracked correctly over
around frame 200, but then looses it again at frame all six sequences based on the PASCAL score: Our algo-
400. In general, the more frames tracked, the less ac- rithms correct frames average to 83.8% followed by MIL-
curate FLOW gets. Track (80.0%), FragTrack (59.8%) and AdaBoost (41.0%).
Figure 5. Tracking results for standard tracking sequences

Sequence Adaboost FragTrack MILTrack PROST search window size increased to 50. Similar to the pre-
Girl 43.3 26.5 31.6 19.0 vious experiment, we use the best out of 5 differently
David 51.0 46.0 15.6 15.3
Sylvester 32.9 11.2 9.4 10.6 initialized runs.
Faceocc 49.0 6.5 18.4 7.0
Faceocc2 19.6 45.1 14.3 17.2 The average pixel error for each method is given in ta-
Tiger1 17.8 39.6 8.4 7.2 ble 4, the PASCAL based score in table 5. Our approach
Table 2. Mean distance of the tracking rectangle to annotated yields the best score in three sequences, tracking correctly
ground truth, the best result is printed in bold faced letters, the
an average of 79.5% over all four sequences, followed by
second best result is underlined.
FragTrack (65.3%), MILTrack (48.5%) and ORF (27.3%).
Sequence Adaboost FragTrack MILTrack PROST Looking at the pixel error graph in Figure 6 directly shows
Girl 24 70 70 89 the benefits of our combined system over the online tracker
David 23 47 70 80 ORF it is based on:
Sylvester 51 74 74 73
Faceocc 35 100 93 100
Faceocc2 75 48 96 82 ORF looses the object in every sequence after at
Tiger1 38 20 77 79 least 400 frames. With the high-dynamic optical flow
Table 3. Percentage of frames tracked correctly. tracker increasing plasticity, our system looses the ob-
ject far less often.
4.5.2 PROST dataset When ORF has lost the track, it performs wrong up-
To further demonstrate the capabilities of our system, we dates until eventually totally drifting away from the
compare on newly created sequences. Besides our own object. This happens in the sequences board, box and
method (parametrized identically to the previous experi- lemming. In liquor, it is able to recover the object three
ments), we benchmark the following algorithms: times. Although far less often, our system also looses
the track several times, but is, except for the last frames
ORF with 100 trees of maximum depth 5 and a search of liquor, always able to recover the object.
region factor of 1.0. Similar to MILTrack [3], we use
Haar-like features. This is exactly the online part of
our tracker, thus this experiment directly shows the Sequence MILTrack ORF FragTrack PROST
benefit of the complementary methods. Board 51.2 154.5 90.1 37.0
Box 104.6 145.4 57.4 12.1
FragTrack [1] with 16 bins and a search window half Lemming 14.9 166.3 82.8 25.4
Liquor 165.1 67.3 30.7 21.6
size of 25 to cope with the larger frame size.
Table 4. Mean distance error to the ground truth.
MILTrack [3], as provided on their webpage with
Figure 6. Tracking results for the PROST dataset

Sequence MILTrack ORF FragTrack PROST References


Board 67.9 10.0 67.9 75.0
Box 24.5 28.3 61.4 91.4 [1] A. Adam, E. Rivlin, and I. Shimshoni. Robust fragments-
Lemming 83.6 17.2 54.9 70.5 based tracking using the integral histogram. In CVPR, 2006.
Liquor 20.6 53.6 79.9 83.7 5, 6
Table 5. Percentage of frames tracked correctly.
[2] S. Avidan. Ensemble tracking. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 29(2):261271, 2007. 1
[3] B. Babenko, M.-H. Yang, and S. Belongie. Visual Tracking
5. Conclusion with Online Multiple Instance Learning. In CVPR, 2009. 1,
2, 4, 5, 6
In this paper, we addressed the robustness and adaptivity [4] S. Baker, D. Scharstein, J. Lewis, S. Roth, M. Black, and
of on-line appearance-based tracking. In order to increase R. Szeliski. A database and evaluation methodology for opti-
cal flow. In ICCV, 2007. http://vision.middlebury.edu/flow/.
the stability and plasticity of an on-line tracker at the same
3
time, we proposed to combine it with both a static and a
[5] S. Birchfield. Elliptical head tracking using intensity gradi-
highly dynamic element. In particular, we combined an on-
ents and color histograms. In CVPR, 1998. 5
line random forest with a simple correlation-based template
[6] L. Breiman. Random forests. Machine Learning, 45:532,
tracker and a novel optical-flow-based mean shift tracker as
2001. 3
most adaptive part. The three elements are combined in a
[7] D. Comaniciu, V. Ramesh, and P. Meer. Real-time tracking
cascade-style.
of non-rigid objects using mean shift. In CVPR, 2000. 1
In the experimental part, we compared our method with [8] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn,
state-of-the-art appearance-based methods on both tracking and A. Zisserman. The Pascal Visual Object Classes (VOC)
benchmark data sets and on own recorded sequences. We Challenge. Int. J. Comput. Vision, 88(2):303308, 2009. 4
demonstrated superior performance in sequences that de- [9] H. Grabner, M. Grabner, and H. Bischof. Real-time tracking
mand more conservative tracking behavior as well as se- via on-line boosting. In Proceedings British Machine Vision
quences with rapid appearance changes with constant pa- Conference, 2006. 1, 5
rameter settings. [10] H. Grabner, C. Leistner, and H. Bischof. Semi-supervised
on-line boosting for robust tracking. In ECCV, 2008. 1, 2
Our approach suggests several extensions: First, we used [11] S. Grossberg. Competitive learning: From interactive acti-
simple methods in our combined tracker. As each individ- vation to adaptive resonance. Neural networks and natural
ual part of our system can be exchanged easily, employ- intelligence, pages 213250, 1998. 1, 2
ing more powerful trackers could increase the performance [12] B. K. P. Horn and B. G. Schunck. Determining optical flow.
of the overall system. Second, the tracker is currently re- Artificial Intelligence, 17, pages 185203, 1981. 3
stricted to axis-aligned fixed-size rectangles. One can in- [13] Y. Li, H. Ai, T. Yamashita, S. Lao, and M. Kawade. Tracking
crease the power of the system by extending it to handle in low frame rate video: A cascade particle filter with dis-
rotation, scale change or affine motion and by giving pixel- criminative observers of different lifespans. In CVPR, 2007.
wise segmentations of the object. 1
Board, Frame 5 Frame 485 Frame 570

Box, Frame 5 Frame 455 Frame 900

Lemming, Frame 5 Frame 325 Frame 975

Liquor, Frame 5 Frame 730 Frame 1285


Figure 7. Exemplar frames of the PROST dataset, the rectangle represents the ground truth.


[14] M. Ozuysal, P. Fua, and V. Lepetit. Fast keypoint recognition [20] D. Shulman and J.-Y. Herve. Regularization of discontinu-
in ten lines of code. In CVPR, 2007. 1 ous flow fields. In Proceedings Workshop on Visual Motion,
[15] J. Pakkanen, J. Iivarinen, and E. Oja. The evolving tree 1989. 3
a novel self-organizing network for data analysis. Neural [21] B. Stenger, T. Woodley, and R. Cipolla. Learning to track
Process. Lett., 20(3):199211, 2004. 4 with multiple observers. In CVPR, 2009. 2
[16] D. A. Ross, J. Lim, R.-S. Lin, and M.-H. Yang. Incremental [22] P. Viola, J. Platt, and C. Zhang. Multiple instance boosting
learning for robust visual tracking. Int. J. Comput. Vision, for object detection. In Advances in Neural Information Pro-
77(1-3):125141, 2008. 5 cessing Systems, volume 18, pages 14171424. MIT Press,
[17] A. Saffari, C. Leistner, J. Santner, M. Godec, and H. Bischof. 2006. 2
On-line random forests. In 3rd IEEE ICCV Workshop on On- [23] M. Werlberger, W. Trobin, T. Pock, A. Wedel, D. Cremers,
line Comput. Vision, 2009. 2, 4 and H. Bischof. Anisotropic Huber-L1 optical flow. In Proc.
[18] T. Sharp. Implementing decision trees and forests on a GPU. of the British Machine Vision Conf., 2009. 3, 4
In ECCV, 2008. 4 [24] C. Zach, T. Pock, and H. Bischof. A duality based approach
[19] J. Shi and C. Tomasi. Good features to track. In CVPR, 1994. for realtime tv-l1 optical flow. In Pattern Recognition (Proc.
1 DAGM), 2007. 3

You might also like