David Rouse, Adam Watkins, David Porter, John Harer, Paul Bendich, Nate Strawn, Elizabeth Munch, Jonathan DeSena, Jesse Clarke, Jeff Gilbert, Sang Chin, Andrew Newman
This paper introduces a method to integrate target behavior into the multiple hypothesis tracker (MHT) likelihood ratio. In particular, a periodic track appraisal based on behavior is introduced that uses elementary topological data analysis coupled with basic machine learning techniques. The track appraisal adjusts the traditional kinematic data association likelihood (i.e., track score) using an established formulation for classification-aided data association. The proposed method is tested and demonstrated on synthetic vehicular data representing an urban traffic scene generated by the Simulation of Urban Mobility package. The vehicles in the scene exhibit different driving behaviors. The proposed method distinguishes those behaviors and shows improved data association decisions relative to a conventional, kinematic MHT.
KEYWORDS: Image quality, Image analysis, Electronic imaging, Distance measurement, Human vision and color perception, Current controlled current source, Binary data, Databases, Performance modeling
Utility estimators predict the usefulness or utility of a distorted natural image when used as a surrogate for a
reference image. They differ from quality estimators in that they should provide accurate estimates even when
images are extremely visibly distorted relative to the original, yet are still sufficient for the task. Our group has
previously proposed the Natural Image Contour Evaluation (NICE) utility estimator. NICE estimates perceived
utility by comparing morphologically dilated binary edge maps of the reference and distorted images using the
Hamming distance.
This paper investigates perceptually inspired approaches to evaluating the degradation of image contours in
natural images for utility estimation. First, the distance transform is evaluated as an alternative to the Hamming
distance measure in NICE. Second, we introduce the image contour fidelity (ICF) computational model that is
compatible with any block-based quality estimator. The ICF pools weighted fidelity degradations across image
blocks with weights based on the local contour strength of an image block, and allows quality estimators to be
repurposed as utility estimators.
The performances of these approaches were evaluated on the CU-Nantes and CU-ObserverCentric databases,
which provide perceived utility scores for a collection of distorted images. While the distance transform provides
an improvement over the Hamming distance, the ICF model shows greater promise. The performances of common
fidelity estimators for utility estimation are substantially improved when they are used in ICF computational
model. This suggests that the utility estimation problem can be recast as a problem of fidelity estimation on
image contours.
The merit of an objective quality estimator for either still images or video is gauged by its ability to accurately
estimate the perceived quality scores of a collection of stimuli. Encounters with radically different distortion types
that arise in novel media representations require that researchers collect perceived quality scores representative
of these new distortions to confidently evaluate a candidate objective quality estimator. Two common methods
used to collect perceived quality scores are absolute categorical rating (ACR)1 and subjective assessment for
video quality (SAMVIQ).2, 3
The choice of a particular test method affects the accuracy and reliability of the data collected. An awareness
of the potential benefits and/or costs attributed to the ACR and SAMVIQ test methods can guide researchers
to choose the more suitable method for a particular application. This paper investigates the tradeoffs of these
two subjective testing methods using three different subjective databases that have scores corresponding to each
method. The subjective databases contain either still-images or video sequences.
This paper has the following organization: Section 2 summarizes the two test methods compared in this
paper, ACR and SAMVIQ. Section 3 summarizes the content of the three subjective databases used to evaluate
the two test methods. An analysis of the ACR and SAMVIQ test methods is presented in Section 4. Section 5 concludes this paper.
Present quality assessment (QA) algorithms aim to generate scores for natural images consistent with subjective
scores for the quality assessment task. For the quality assessment task, human observers evaluate a natural
image based on its perceptual resemblance to a reference. Natural images communicate useful information to
humans, and this paper investigates the utility assessment task, where human observers evaluate the usefulness of
a natural image as a surrogate for a reference. Current QA algorithms implicitly assess utility insofar as an image
that exhibits strong perceptual resemblance to a reference is also of high utility. However, a perceived quality
score is not a proxy for a perceived utility score: a decrease in perceived quality may not affect the perceived
utility. Two experiments are conducted to investigate the relationship between the quality assessment and utility
assessment tasks. The results from these experiments provide evidence that any algorithm optimized to predict
perceived quality scores cannot immediately predict perceived utility scores. Several QA algorithms are evaluated
in terms of their ability to predict subjective scores for the quality and utility assessment tasks. Among the QA
algorithms evaluated, the visual information fidelity (VIF) criterion, which is frequently reported to provide the
highest correlation with perceived quality, predicted both perceived quality and utility scores reasonably. The
consistent performance of VIF for both the tasks raised suspicions in light of the evidence from the psychophysical
experiments. A thorough analysis of VIF revealed that it artificially emphasizes evaluations at finer image scales
(i.e., higher spatial frequencies) over those at coarser image scales (i.e., lower spatial frequencies). A modified
implementation of VIF, denoted VIF*, is presented that provides statistically significant improvement over VIF
for the quality assessment task and statistically worse performance for the utility assessment task. A novel utility
assessment algorithm, referred to as the natural image contour evaluation (NICE), is introduced that conducts a
comparison of the contours of a test image to those of a reference image across multiple image scales to score the
test image. NICE demonstrates a viable departure from traditional QA algorithms that incorporate energy-based
approaches and is capable of predicting perceived utility scores.
Natural images are meaningful to humans - the physical world exhibits statistical regularities that permit the
human visual system (HVS) to infer useful interpretations. These regularities communicate the visual structure
of the physical world and govern the statistics of images (image structure). A signal processing framework is
sought to analyze image characteristics for a relationship with human interpretation. This work investigates
the first step toward an objective visual information evaluation: predicting the recognition threshold of different
image representations. Given a image sequence, whose images begin as unrecognizable and are gradually refined
to include more information according to some measure, the recognition threshold corresponds to first the image
in the sequence in which an observer accurately identifies the content. Sequences are produced using two
types of image representations: signal-based and visual structure preserving. Signal-based representations add
information as dictated by conventional mathematical characterizations of images based on models of low-level
HVS processing and use basis functions as the basic image components. Visual structure preserving representations
add information to images attributed to visual structure and attempt to mimic higher-level HVS
processing by considering the scene's objects as the basic image components. An experiment is conducted to
identify the recognition threshold image. Several full-reference perceptual quality assessment algorithms are
evaluated in terms of their ability to predict the recognition threshold of different image representations. The
cross-correlation component of a modified version of the multi-scale structural similarity (MS-SSIM) metric,
denoted MS-SSIM*, exhibits a better overall correlation with the signal-based and visual structure preserving
representations' average recognition thresholds than the standard MS-SSIM cross-correlation component. These
findings underscore the significance of visual structure in recognition and advocate a multi-scale image structure
analysis for a rudimentary evaluation of visual information.
Modern algorithms that process images to be viewed by humans analyze the images strictly as signals, where
processing is typically limited to the pixel and frequency domains. The continuum of visual processing by the
human visual system (HVS) from signal analysis to cognition indicates that the signal-processing based model of
the HVS could be extended to include some higher-level, structural processing. An experiment was conducted to
study the relative importance of higher-level, structural representations and lower-level, signal-based representations
of natural images in a cognitive task. Structural representations preserve the overall image organization
necessary to recognize the image content and discard the finer details of objects such at textures. Signal-based
representations (i.e. digital photographs) decompose an image in terms of its frequency, orientation, and contrast.
Participants viewed sequences of images from either structural or signal-based representations, where subsequent
images in the sequence reveal additional detail or visual information from the source image. When the content
was recognizable, participants were instructed to provide a description of that image in the sequence. The
descriptions were subjectively evaluated to identify a participant's recognition threshold for a particular image
representation. The results from this experiment suggest that signal-based representations possess meaning to
human observers when the proportion of high frequency content, which conveys shape information, exceeds a
seemingly fixed proportion. Additional comparisons among the representations chosen for this experiment provide
insight toward quantifying their significance in cognition and developing a rudimentary measure of visual
entropy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.