PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
The Conference on Human Vision and Electronic Imaging had its origins as three sessions in the 1988 SPIE/SPSE Symposium on Electronic Imaging Devices and Systems. These sessions brought together visual psychophysicists and imaging scientists and engineers to explore the relevance of human vision to the design of imaging systems. In the early years of the conference, the focus was on display technology and low-level image coding and rendering. The scope of the conference has grown with the evolution of electronic imaging technology, and the conference today includes papers on visualization, machine vision, digital image libraries, and art. Over the years, the conference has become more focused on truly integrating perception and engineering. We have been proud to see how our community has applied knowledge of perceptual systems to create novel engineering designs, and how knowledge of engineering challenges has led to the identification of novel directions for vision research. This paper will examine the progress of this multidisciplinary field as seen through the lens of this conference, and will speculate on where we are headed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Over the past several years we have investigated viewer response to temporal fluctuations in the quality of digital television pictures, which occur when video is coded into relatively low bit rates. Three phenomena of interest have been identified: (1) a forgiveness effect, (2) a recency effect, and (3) a negative-peak (duration-neglect); these are described and discussed in the paper. In collaboration with our partners in European projects MOSAIC and TAPESTRIES, we have developed a three-stage method of measuring time-variant quality, which has been accepted by the ITU-R. The first stage is a Single Stimulus Continuous Quality Evaluation (SSCQE) of instantaneous quality; the second a calibration stage to link SSCQE with conventional DSCQS, and the third stage a numerical procedure for relating continuous and overall quality. Some of the factors we have identified as being important in producing good overall quality judgements have relevance to the design of optimal coding strategies for digital television.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Human vision plays an essential role, either implicity or explicitly in image rendering, especially digital halftoning. In this paper, we review the progress that has been made in the past 16 years in understanding the visual characteristics of halftone texture and in the use of human visual system models as an intrinsic part of the halftoning algorithm. We examine the development of the direct binary search halftoning algorithm as a case study in the challenges of achieving satisfactory computational performance with a model-based halftoning algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
James Clerk Maxwell demonstrated the first color photograph in a lecture to the Royal Society of Great Britain in 1861. He used this demonstration to illustrate Thomas Young's idea that human vision uses three kinds of light sensors. This demonstration led to a great variety of color photographic systems using both additive and subtractive color. Today, we have photographic, video, still digital, and scanning image- capture devices. We have electrophotographic, ink jet, thermal and holographic hard-copy systems, as well as, cathode ray tube, liquid-crystal display, and other light emission color devices. The major effort today is to get control of all these technologies so that the user can, without effort, move his color, digital image from one technology to another without changing the appearance of the image. The strategy of choice is to use colorimetry to calibrate each device. If all prints and displays sent the same colorimetric values from every pixel the images, regardless of the display, will appear identical. The problem is that prints and displays have very different color gamuts. A more satisfactory solution is needed. In my view, the future emphasis of color will be in models of human vision that calculate the color appearance, rather than the color match. All the technologies listed above work one pixel at a time. The response at every pixel is dependent on the input at that pixel, regardless of whether the imaging system is chemical, photonic or electrical. Humans are different. The color they see at a pixel is controlled by that pixel and all the other pixels in the field of view. Human color vision is a spatial calculation involving the whole image. In the future, we will see more models that compute the color appearance from spatial information and write color sensations on media, rather than attempting to write the quanta catch of visual receptors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Basic vision science research has reached the point that many investigators are now designing quantitative models of human visual function in areas such as, pattern discrimination, motion detection, optical flow, color discrimination, adaptation and stereopsis. These models have practical significance in their application to image compression technologies and as tools for evaluating image quality. We have been working on a vision modeling environment, called Mindseye, that is designed to simplify the implementation and testing of general purpose spatio- temporal models of human vision. Mindseye is an evolving general-purpose vision-modeling environment that embodies the general structures of the visual system and provides a set of modular tools within a flexible platform tailored to the needs of researchers. The environment employs a user- friendly graphics interface with on-line documentation that describes the functionality of the individual modules. Mindseye, while functional, is still research in progress. We are seeking input from the image compression and evaluation community as well as from the vision science community as to the potential utility of Mindseye, and how it might be enhanced to meet future needs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Previously a computational model of human color vision was described which simulates the main retinal and cortical processes involved in color perception and which makes predictions about responses to spatiochromatic stimuli. The emphasis from early on was on ensuring validation of the model as it developed, but its growing complexity combined with considerations of linking it to a multiscale contrast model made the development increasingly cumbersome. The model was therefore completely rewritten as a set of Khoros (Khoral Research Inc) utilities which provide user friendly access to the model and its components via the visual programming interface. This paper describes the details of Khoros implementation and presents examples of the quantitative predictions made by the model for different simulated psychophysical experiments including increment threshold, grating sensitivity and grating masking. Current areas of activity include examining different gain processes at different stages of the model and their implications as possible components of color constancy mechanisms, and the impact of different types of cortical demultiplexing processes on the predictions made by the model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we analyze the properties of a repeated isotropic center-surround inhibition which includes single nonlinearities like half-wave rectification and saturation. Our simulation results show that such operations, here implemented as iterated nonlinear differences and ratios of Gaussians (INDOG and INROG), lead to endstopping. The benefits of the approach are twofold. Firstly, the INDOG can be used to design simple endstopped operators, e.g., corner detectors. Secondly, the results can explain how endstopping might arise in a neural network with purely isotropic characteristics. The iteration can be implemented as cascades by feeding the output of one NDOG to a next stage of NDOG. Alternatively, the INDOG mechanism can be activated in a feedback loop. In the latter case, the resulting spatio-temporal response properties are not separable and the response becomes spatially endstopped if the input is transient. Finally, we show that ON- and OFF-type INDOG outputs can be integrated spatially to result in quasi- topological image features like open versus closed and the number of components.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Here we demonstrate a method for constructing stimulus classification images. These images provide information regarding the stimulus aspects the observer uses to segregate images into discrete response categories. Data are first collected on a discrimination task containing low contrast noise. The noises are then averaged separately for the stimulus-response categories. These averages are then summed with appropriate signs to obtain an overall classification image. We determine stimulus classification images for a vernier acuity task to visualize the stimulus features used to make these precise position discriminations. The resulting images reject the idea that the discrimination is performed by the single best discriminating cortical unit. The classification images show one Gabor-like filter for each line, rejecting the nearly ideal assumption of image discrimination models predicting no contribution from the fixed vernier line.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For evaluating and improving image compression algorithms, there is a great need for an image fidelity metric that measures the perceptual difference between images. Image discrimination models, that model for human visual system, have been suggested as such metrics. The models found in the literature vary considerably in features, complexity and performance. The suitability of a certain model will depend on the application. Trying to find a model suitable for a particular application is difficult since most models are reported for different applications and supported by different data. Furthermore, tuning a particular model to a new application and environment is not a straightforward exercise. In this paper, we have brought together some well- known image discrimination models and compared their performance against one set of psychophysical data, with image fidelity as the intended task. The data was collected for three types of distortion: blocking, blurring and ringing. A comparison of these results from the different models showed that models using cross-channel masking gave the best overall results. However, the difference in performance for different models was small and the performance vary for the different types of distortion, but all models were better than a traditional metric (Peak Signal to Noise Ratio).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years a number of different vision models have been proposed to assist in the evaluation of image quality. However, there have been few attempts to independently evaluate these models and to make comparisons between them. In this paper we first summarize the work that has been done in image quality modeling. We then select two of the leading image quality models, the Daly Visible Differences Predictor and the Sarnoff Visual Discrimination Model, for further study. We begin by describing our implementation, which was done from the published papers, of each of the models. We next discuss the similarities and the differences between the two models. The paper ends with a summary of the important advantages of each approach. The comparison of these two models is presented in the context of our research interests which are image quality evaluation for both computer imaging and computer graphics tasks. The paper includes illustrations drawn from these two areas.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years a number of image fidelity measures have been developed. These measures are designed to predict a person's ability to perceive differences between two nearly identical images. Successful image fidelity measures allow digital imaging developers to replace difficult and time consuming subjective evaluations with automated evaluations. Although a number of image fidelity measures have been developed, no method for evaluating and comparing the accuracy of these measures has been commonly accepted. In this paper we describe a new method for evaluating image fidelity measures. The method involves comparing spatially localized ratings from a human subject with distortion maps generated by an image fidelity measure.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a simplified dual-channel discrimination model with spatio-temporal filters to represent the visual system contrast sensitivity, and masking based on local spatio- temporal contrast energy. The contrast sensitivity filter parameters of the model were based on previous work. The masking and global sensitivity parameters are calibrated to masking data using brief grating target signals masked by a 700 msec grating with the same spatial parameters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The processing and representation of motion information is addressed from an integrated perspective comprising low- level signal processing properties as well as higher-level cognitive aspects. For the low-level processing of motion information we argue that a fundamental requirement is the existence of a spatio-temporal memory. Its key feature, the provision of an orthogonal relation between external time and its internal representation, is achieved by a mapping of temporal structure into a locally distributed activity distribution accessible in parallel by higher-level processing stages. This leads to a reinterpretation of the classical concept of `iconic memory' and resolves inconsistencies on ultra-short-time processing and visual masking. The spatial-temporal memory is further investigated by experiments on the perception of spatio-temporal patterns. Results on the direction discrimination of motion paths provide evidence that information about direction and location are not processed and represented independent of each other. This suggests a unified representation on an early level, in the sense that motion information is internally available in form of a spatio-temporal compound. For the higher-level representation we have developed a formal framework for the qualitative description of courses of motion that may occur with moving objects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The advent of widespread distribution of digital video creates a need for automated methods for evaluating the visual quality of digital video. This is particularly so since most digital video is compressed using lossy methods, which involve the controlled introduction of potentially visible artifacts. Compounding the problem is the bursty nature of digital video, which requires adaptive bit allocation based on visual quality metrics, and the economic need to reduce bit-rate to the lowest level that yields acceptable quality. In previous work, we have developed visual quality metrics for evaluating, controlling,a nd optimizing the quality of compressed still images. These metrics incorporate simplified models of human visual sensitivity to spatial and chromatic visual signals. Here I describe a new video quality metric that is an extension of these still image metrics into the time domain. Like the still image metrics, it is based on the Discrete Cosine Transform. An effort has been made to minimize the amount of memory and computation required by the metric, in order that might be applied in the widest range of applications. To calibrate the basic sensitivity of this metric to spatial and temporal signals we have made measurements of visual thresholds for temporally varying samples of DCT quantization noise.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a technique for controlling the adaptive quantization process in an MPEG encoder, which improves upon the commonly used TM5 rate controller. The method combines both a spatial masking model and a technique for automatically determining the visually important areas in a scene. The spatial masking model has been designed with consideration of the structure of compressed natural images. It takes into account the different levels of distortion that are tolerable by viewers in different parts of a picture by segmenting the scene into flat, edge, and textured regions and quantizing these regions differently. The visually important scene areas are calculated using Importance Maps. These maps are generated by combining factors known to influence human visual attention and eye movements. Lower quantization is assigned to visually important regions, while areas classified as being of low visual importance are more harshly quantized. Results indicate a subjective improvement in picture quality, in comparison to the TM5 method. Less ringing occurs at edges, and the visually important areas of a picture are more accurately coded. This is particularly noticeable at low bit rates. The technique is computationally efficient and flexible, and can easily be extended to specific applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents an objective perceptual distortion measure quantifying the visibility of edge-like blocking artifacts in coded image sequences resulting from popular transform coding techniques. The prime motivation for this work is the awareness that properties of the human visual system should be central to the design and evaluation of image coding algorithms. The perceptual metric is the output of a visual model incorporating both the spatial and temporal characteristics of the visual system. Parameters of the model are based on results from a number of visual experiments in which sensitivities to simulated blocking artifacts were measured under various spatio-temporal background conditions. The visual model takes a pair of original and distorted sequences as inputs. Distortions are calculated along the vertical and horizontal directions. Visibility dependencies on spatial, temporal and motion activities of the background are incorporated using linear filtering and motion estimation. Pixel-based distortions are combined over local spatial and temporal regions to generate an overall distortion measure for each orientation. The final model output is the sum of the vertical and horizontal distortion measures. The model was applied to coded image sequences and the resulting distortion measures were compared to outcomes of subjective ranking tests. Results indicate that the perceptual distortion measure agrees well with human evaluation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The conventional synchronous model of digital video, in which video is reconstructed synchronously at the decoder on a frame-by-frame basis, assumes its transport is delay- jitter-free. This assumption is inappropriate for modern integrated service packet networks such as the Internet for network delay jitter varies widely. Furthermore, multiframe buffering is not a viable solution in interactive applications such as video conferencing. We have proposed a `delay cognizant' model of video coding (DCVC) that segments an incoming video into two video flows with different delay attributes. The DCVC decoder operates in an asynchronous reconstruction mode that attempts to maintain image quality in the presence of network delay jitter. Our goal is to maximize the allowable delay of one flow relative to that of the other with minimal effect on image quality since an increase in the delay offset reflects more tolerance to transmission delay jitter. Subjective quality evaluations indicates for highly compressed sequences, differences in video quality of reconstructed sequences with large delay offsets as compared with zero delay offset are small. Moreover, in some cases asynchronously reconstructed video sequences look better than the zero delay case. DCVC is a promising solution to transport delay jitter in low- bandwidth video conferencing with minimal impact on video quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A model describing the envelope of visual spatial sensitivity as a function of retinal velocities has been revisited. This spatiovelocity CSF model has been extended to include the effects of three types of eye movements: natural drift, smooth pursuit, and saccadic. Together, these extend the model from retinal velocities to image plane velocities in the context of natural viewing conditions. It is more straightforward to incorporate the effects of eye movements in a spatiovelocity CSF than in a spatiotemporal CSF generated from counterphase flicker stimuli. However, once the eye movements are incorporated, the spatiovelocity CSF can be rotated into a spatiotemporal CSF that is valid for unconstrained natural viewing conditions. The resulting visual model will be analyzed with respect to some of the key HDTV and computer graphics display formats.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We derive a visual image quality metric from a model of human visual processing that takes as its input an original image and a compressed or otherwise altered version of that image. The model has multiple channels tuned to spatial frequency, orientation and color. Channel sensitivities are scaled to match a bandpass achromatic spatial frequency contrast sensitivity function (CSF) and lowpass chromatic CSFs. The model has a constant gain control with parameters based on the results of human psychophysical experiments on pattern masking and contrast induction. These experiments have shown that contrast gain control within the visual system is selective for spatial frequency, orientation and color. The model accommodates this result by placing a contrast gain control within each channel and by letting each channel's gain control be influenced selectively by contrasts within all channels. A simple extension to this model provides predictions of color image quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to achieve a color image coding based on the human visual system features, we have been interested by the design of a perceptually based quantizer. The cardinal directions Ach, Cr1 and Cr2, designed by Krauskopf from habituation experiments and validated in our lab from spatial masking experiments, have been used to characterize color images. The achromatic component, already considered in previous study, will not be considered here. The same methodology has been applied to the two chromatic components to specify the decision thresholds and the reconstruction levels which ensure that the degradations induced will be lower than their visibility thresholds. Two observers have been used for each of the two components. From the values obtained for Cr1 component one should notice that the decision thresholds and reconstruction levels follow a linear law even at higher levels. However, for Cr2 component the values seem following a monotonous increasing function. To determine if these behaviors are frequency dependent, further experiments have been conducted with stimulus frequencies varying from 1cy/deg to 4cy/deg. The measured values show no significant variations. Finally, instead of sinusoidal stimuli, filtered textures have been used to take into account the spatio-frequential combination. The same laws (linear for Cr1 and monotonous increasing for Cr2) have been observed even if a variation in the quantization intervals is reported.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A new metric for the assessment of color image coding quality is presented in this paper. Two models of chromatic and achromatic error visibility have been investigated, incorporating many aspects of human vision and color perception. The achromatic model accounts for both retinal and cortical phenomena such as visual sensitivity to spatial contrast and orientation. The chromatic metric is based on a multi-channel model of human color vision that is parameterized for video coding applications using psychophysical experiments, assuming that perception of color quantization errors can be assimilated to perception of supra-threshold local color-differences. The final metric is a merging of the chromatic model and the achromatic model which accounts for phenomenon as masking. The metric is tested on 6 real images at 5 quality levels using subjective assessments. The high correlation between objective and subjective scores shows that the described metric accurately rates the rendition of important features of the image such as color contours and textures.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A digital color image quality metric is proposed in this work based on the characteristics of the human visual system. Chromatic coordinates are transformed from spectral cone absorption responses to the opponent-color space. The sensitivity thresholds in each of the color space axes are measured and visual masking models are provided and parameterized. Multiple contrasts are computed by the wavelet coefficients in their corresponding resolutions. The new objective error measure is defined as the aggregate contrast mismatch between the original and compressed images. Experimental results are given to show its consistency with human observation experience and subjective ranking.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Designs of imaging systems, image processing algorithms etc. usually take for granted that methods for assessing perceived image quality produce unbiased estimates of the viewers' quality impression. Quality judgments, however, are affected by the judgment strategies induced by the experimental procedures. In this paper the results of two experiments are presented illustrating the influence judgment strategies can have on quality judgments. The first experiment concerns contextual effects due to the composition of the stimulus set. Subjects assessed the sharpness of two differently composed sets of blurred versions of one static image. The sharpness judgments for the blurred images present in both stimulus sets were found to be dependent on the composition of the set. The magnitude of this effect was determined by the scaling technique employed. In the second experiment subjects assessed either the overall quality or the overall impairment of JPEG-coded images containing two main artifacts. Two scaling methods were used: single stimulus scaling and comparison scaling. The results indicate a systematic difference between the quality and impairment judgments that could be interpreted as an instruction-based different weighting of the two artifacts. Again, some influence of scaling technique was observed. The results of both experiments underscore the important role judgment strategies play in the psychophysical evaluation of image quality. Ignoring this influence on quality judgments may lead to invalid conclusions about the viewers' impression of image quality. Accordingly, knowledge about judgment strategies is indispensable in designing and improving evaluation techniques.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We examined figural after-effects in natural images by using as adapt and test stimuli images of human faces, for which small changes in configuration are highly discriminable. Observers either matched a face to a memorized face or rated faces as either `normal' or `distorted', before or after viewing a distorted image of the same face. Prior adaptation strongly biases face recognition: after viewing the distorted image, the original face appears distorted in a direction opposite to the adapting distortion. However, no after-effects are observed when either the adapting image or the test image is inverted, indicating that the adaptation is not to the distortion gradient in the image (which is the same for upright or inverted images), but depends instead on the specific configuration of the stimulus. We further show that the figural after-effects for face images are highly asymmetric, for adapting to the original face has little effect on the perception of a distorted face. This asymmetry suggests that adaptation may play an important normalizing role in face perception. Our results suggest that in normal viewing figural after-effects may play a prominent role in form perception, and could provide a novel method for probing the mechanisms underlying human face perception.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
What requirements do people place on optimal color reproduction of real-life scenes? We suggest that when people look at images containing familiar categories of objects, two primary factors shape their subjective impression of how optimal colors are reproduced: perceived naturalness and perceived colorfulness of the images. To test the model subjects were asked to evaluate the perceived `naturalness', `colorfulness' and `quality' of images of real-life scenes. The judgments were related to statistical parameters of the color point distribution over the images in CIE 1976 (L*u*v*) color space. It was found that the perceptually optimal color reproduction can be derived from this statistic within the framework of our model. We also specify the naturalness, colorfulness and quality indices, describing the observer's judgments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Within the area of broadcasting and entertainment, stereoscopic displays are used to heighten the viewer's sense of excitement and quality. To evaluate these subjective experiences, an appreciation-oriented approach seems to be appropriate. Within this framework, this paper reports on two experiments in which we investigated the influence of image disparity, convergence distance and focus length on the subjective assessment of depth, naturalness, quality and eye-strain. Twelve observers with normal or corrected-to-normal vision and good stereopsis viewed a fully randomized presentation of stereoscopic still images that varied systematically in image disparity, convergence distance and focus length. In the first experiment observers were asked to rate, in separate counterbalanced sessions, their impression of depth, naturalness of depth and quality of depth. In the second experiment observers were asked to rate the eye-strain they experienced on a five point rating scale. Results indicate that observers prefer a stereoscopic presentation of images over a monoscopic presentation. A clear optimum for quality and naturalness judgments was found at 4 cm image disparity, which was also rated by observers as the stereoscopic condition that produced the least eye-strain. Extreme image disparities were found to be annoying, producing low quality and naturalness ratings accordingly. Although there was a strong linear relationship between naturalness and quality (a correlation of r equals 0.96), a small but systematic shift could be observed. This quality-naturalness shift is discussed in relation to similar, yet more pronounced findings in the color domain.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Electronic Imaging Based on Retinal Processing and Eye Movements
Foveated imaging exploits the fact that the spatial resolution of the human visual system decreases dramatically away from the point of gaze. Because of this fact, large bandwidth savings are obtained by matching the resolution of the transmitted image to the fall-off in resolution of the human visual system. We have developed a foveated multiresolution pyramid video coder/decoder which runs in real-time on a general purpose computer (i.e., a Pentium with the Windows 95/NT OS). The current system uses a foveated multiresolution pyramid to code each image into 5 or 6 regions of varying resolution. The user-controlled foveation point is obtained from a pointing device (e.g., a mouse or an eyetracker). Spatial edge artifacts between the regions created by the foveation are eliminated by raised- cosine blending across levels of the pyramid, and by `foveation point interpolation' within levels of the pyramid. Each level of the pyramid is then motion compensated, multiresolution pyramid coded, and thresholded/quantized based upon human contrast sensitivity as a function of spatial frequency and retinal eccentricity. The final lossless coding includes zero-tree coding. Optimal use of foveated imaging requires eye tracking; however, there are many useful applications which do not require eye tracking.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work we demonstrate the relationship existing between two important issues in vision: multi-scale local spectrum analysis, and log-polar foveatization. We show that, when applying a continuous set of self-similar (rotated and scaled) band-pass filters to estimate the local spectrum at a given point of attention of the image, the inverse Fourier transform of this local spectrum is a log- polar foveated version of the original image at that position. Both the local spectrum and its associated foveated image can be obtained through log-polar warping of the spectral/spatial domain followed by a conventional invariant low-pass filtering and the corresponding inverse warping. Furthermore, the low-pass filters in the warped space and frequency domains are mirror versions of each other. Thus, filters with mirror symmetry under the log- polar warping are self-dual, and make the foveatization process commute with the Fourier transform. Nevertheless, in order to implement a fovea that can be easily moved across the image, it is preferable to use a fixed bank of steerable filters, instead of applying log-polar warpings with different centers. Using low-pass scalable filters we have implemented a real-time moving fovea. We believe that a dual finite spatial/spectral local representation of images could be a very powerful tool for many visual tasks, which could benefit from a dual explicit representation in space and spatial frequency, as well as from the rotation and scale invariance naturally achieved in both domains.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Experimental results are presented evaluating two wavelet- based gaze-contingent video resolution degradation methods under three foveal Region of Interest (ROI) placement strategies. ROI placement is described by the introduction of a novel visualization of viewers' scanpaths, termed Volumes of Interest (VOIs). VOIs represent foveal loci of gaze in 3D space-time. Three ROI placement strategies, ideal, preattentive, and aggregate, are used to determine the location of an unprocessed, dynamic spatial resolution region corresponding to the projected dimension of the fovea. Results indicate imperceptible degradation effects of both ideal and preattentive strategies under a visual tracking paradigm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The research of image representation method based on nonuniform sampling and the development of the foveated sensors are active research fields in recent years. We propose in this paper a nonuniform sampling image representation method based on an improved log-polar transform and apply it into the knowledge-based active pattern recognition. The novelty of our method lies in three aspects. First, compared with other nonuniform representation methods, our method provides a flexible structure between the pure nonuniform sampling and the classical uniform sampling representation and imitates the focus characteristic of human vision. The size of the areas of interest which is uniformly sampled with the highest resolution can be adjusted arbitrarily according to the knowledge of vision task and objects. Second, we proposed a knowledge-based method to decide `where to look next' based on the fovea-periphery structure. By introducing the concept of knowledge grain, knowledge of objects is organized hierarchically, from coarse to fine. We use fine grain knowledge to do the accurate pattern recognition in fovea area and use coarse grain knowledge to locate the fixation point candidates in periphery. Third, we give a general paradigm for knowledge-based active pattern recognition. Nonuniform sampling transform is imposed on the input image to obtain the fovea-periphery structure first. Then different grain of knowledge is used to solve the problems of `what it is' and `where it is` in fovea and periphery. The above procedure is repeated until no more fixation points can be found or the goal of vision task has already been reached. Experimental result in this paper demonstrates our idea to be a valid one.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Designers of helmet- and head-mounted displays (HMDs) often assume that monocular devices improve operator performance relative to binocular devices by increasing field of view and by allowing two tasks to be performed simultaneously (one by each eye). To test these assumptions, we implemented a modified useful field of view (UFOV) paradigm in which subjects localized a peripheral target along give meridians within a semicircular region while simultaneously performing a central task. The tasks were either presented to the same eye or to different eyes (simulating a monocular HMD). Because previous research has established age-related changes in the UFOV, the present study investigated the performance of middle-aged observers and compared it to results obtained from young observers in an earlier study. In general, middle-aged observers made more peripheral target localization errors than young observers, indicating an overall constriction of the UFOV. The dependence of localization performance on viewing condition, peripheral distractors and central task, however, was the same for both age groups. Most notably, there was no difference in performance as a function of viewing condition. Thus, these findings do not support the assumption that dividing attention between two eyes allows dual tasks to be performed more efficiently than when attention is divided within the visual field of one eye.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Color for the Internet: Perceptual and Engineering Issues
A uniformly adopted color standards infrastructure has a dramatic impact on any color imaging industry and technology. This presentation begins by framing the current color standards situation in a historical context. A series of similar appearing infrastructure adoptions in color publishing during the last fifty years are reviewed and compared to the current events. This historical review is followed by brief technical, business and marketing reviews of two of the more popular recent color standards proposals, sRGB and ICC, along with their operating system implementations in the Microsoft and Apple operating systems. The paper concludes with a summary of Hewlett- Packard Company's and Microsoft's proposed future direction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Gamma characterizes the reproduction of tone scale in an imaging system. Gamma summarizes, in a single numerical parameter, the nonlinear relationship between code value--in an 8-bit system, from 0 through 255--and physical intensity. Nearly all image coding systems are nonlinear, and so involve values of gamma different from unity. Owing to poor understanding of tone scale reproduction, and to misconceptions about nonlinear coding, gamma has acquired a terrible reputation in computer graphics and image processing. In addition, the world-wide web suffers from poor reproduction of grayscale and color images, due to poor handling of nonlinear image coding. This paper aims to make gamma respectable again.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Color for the Internet: Perceptual and Engineering Issues
Accurate color rendering across displays is complicated on the World Wide Web by the color-handling properties of individual web browsers. However, the major browsers have all adopted a minimal color palette of 216 RGB triples, called the `browser-safe' colors, suitable for use with web page backgrounds and text, logos, cartoons, and line drawings. For naturalistic or photographic images, however, simple quantization to 216 colors can produce images with altered hues or color banding. We show that dithering with the browser-safe colors is a good strategy for such images, especially at spatial resolutions above 150 dpi. However, even if the RGB image is transmitted and received unaltered, the system gamma will affect appearance. Ambient lighting contributes to the rendered image's appearance, but does not mask the effects of differences in monitor gammas. The need for an image-rendering convention traceable to the CIE is underscored by these results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Three color schemes (monochrome, dichrome, and polychrome) based on basic principles for color perception and cognition were optimized and applied to an electronic map in a horizontal-situation display. Principles for color discrimination, symbol coding, and color naming were applied for the super-imposed symbols (targets, waypoints etc) and for the map symbology (land, water, roads). The color codes were tested in a visual search and detection experiment in a real-time simulation in an air-to-air mission with test pilots as subjects. The simulation task was as close as possible to a real-life situation. The pilots had to track a maneuvering target within specific limits. Reaction times for target detection were recorded. After the simulation, the test pilots gave a subjective estimation of the different color schemes. They also estimated them according to situation awareness using a rating technique on cognitive compatibility (CC-SART). All the results, both the objective and the subjective show that color schemes are advantageous in comparison to the monochrome code. The reaction times were significantly lower for the chromatic color codes. The estimated situation awareness was higher for the chromatic schemes and the subjects gave higher preferences to the chromatic codes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Palette synthesis and analysis tools have been built based upon a model of color experience. This model adjusts formal compositional elements such as hue, value, chroma, and their contrasts, as well as size and proportion. Clothing and household product designers were given these tools to give guidance to their selection of seasonal palettes for use in production of the private-label merchandise of a large retail chain. The designers chose base palettes. Accents to these palettes were generated with and without the aid of the color tools. These palettes are compared by using perceptual metrics and interviews. The results are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We analyze a 3D skeletal representation of the user in spatial and temporal domains as a tool necessary to recognizing the gestures of drawing, picking and grabbing. The mechanisms of visual perception that are called upon in the imaginative process of artistic creation use those same tactile and kinesthetic pathways and structures in the brain which are employed when we manipulate the 3D world. We see, in fact, with our sensual bodies as well as with our eyes. Our interface is built on an analysis of pointing and gesturing and how they related to the perception of form in space. We report on our progress in implementing a body language user interface for artistic computer interaction, i.e., an human/computer interaction based on an analysis of how an artist uses her body in the act of creation. Using two synchronous TV cameras, we have videotaped an environment into which an artist moves, assumes a canonical (Da Vinci) pose and subsequently makes a series of simple gestures. The video images are processed to generate an animated 3D skeleton that corresponds to the skeleton of the artist. The locus of the path taken by the drawing hand is the source of a trace of particles. Our presentation shows the two simultaneous videos, the associated animated 3D skeleton, that skeleton as an instance of motion capture for a constrained model of a human skeleton and the trace of the path taken by the drawing hand.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Brunelleschi (1413) was the first to demonstrate that a 3D scene can be represented by a 2D perspective picture in such a way that retinal images produced by the scene and the picture are identical (subsequently, Leonardo pointed out that this is true only when the observer's eye is placed at the center of perspectivity that was used to produce this picture). It follows that in the absence of depth cues, the percepts are identical as well. A question arises as to the effect on the percept of viewing the picture from a point different from the center of perspectivity. According to Pirenne's (1970) theory, the percept involves taking the cues to the orientation and position of the picture relative to the observer into account, in order to compensate for the incorrect viewing point; when these cues are available, the percept is accurate. We will demonstrate a new visual phenomenon called `cuboid illusion' which contradicts Pirenne's theory. Our experimental results show that the percept of a 3D object from its picture systematically depends on the orientation and position of the picture relative to the observer even in the presence of many cues.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The early history of lenses is controversial. The author has sought to address the problem by identifying lens elements (mainly convex/plano) which remain associated with objects intended to be viewed through them (i.e., in their original context). These are found in museums in sculptures, rings, pendants, etc. A number of outstanding examples will be illustrated in the talk; these sophisticated pieces of art are certainly not first constructs. Most are of rock crystal, rose quartz, or glass. Lenses have origin among artisans rather than scientists. Clearly, skills were often lost and rediscovered. Early lens-like objects have been found broadly in the eastern Mediterranean area/Middle East, in France, in Italy (Rome), and possibly in Peru and Scandinavia, etc. To date, the earliest lenses identified in context are from the IV/V Dynasties of Egypt, dating back to about 4500 years ago (e.g., the superb `Le Scribe Accroupi' and `the Kai' in the Louvre; added fine examples are located in the Cairo Museum). Latter examples have been found in Knossos (Minoan [Herakleion Museum]; ca. 3500 years ago); others had origin in Greece (examples in the Athens National Archeological Museum and the British Museum equals BM), in Rome (Metropolitan Museum, NY; BM; Vatican Museums; Bologna Archeological Museum), etc. Also. of great interest is the study of possible lens applications. This is a fascinating scientific, artistic and intellectual project.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The importance of the center of the canvas has long been appreciated in art, as has the way the eyes as revealing the personality of the subjects of portraits. Is there a consistent placement of the eyes relative to the canvas frame, based on the horizontal position of the eyes in portraits? Data from portraits over the past 2000 years quantify that one eye is centered with a standard deviation of less than +/- 5%. Classical texts on composition do not seem to mention the idea that the eyes as such should be positioned relative to the frame of the picture; the typical emphasis is on the placement of centers of mass in the frame or relative to the vanishing point in cases of central perspective. If such a compositional principle is not discussed in art analysis, it seems that its manifestation throughout the centuries and varieties of artistic styles (including the extreme styles of the 20th century) must be guided by unconscious perceptual processes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Study of drawings by patients with local lesions of the right or left hemisphere allows to understand how artistic thinking is supported by brain structures. The role of the right hemisphere is significant at the early stage of creative process. The right hemisphere is a generator of nonverbal visuo-spatial thinking. It operates with blurred nonverbal images and arrange them in a visual space. With the help of iconic signs the right hemisphere reflects the world and creates perceptive visual standards which are stored in the long-term right hemisphere memory. The image, which appeared in the `inner' space, should be transferred into a principally different language, i.e. a left hemispheric sign language. This language operates with a number of discrete units, logical succession and learned grammar rules. This process can be explained by activation (information) transfer from the right hemisphere to the left one. Thus, natural and spontaneous creative process, which is finished by a conscious effort, can be understood as an activation impulse transfer from the right hemisphere to the left one and back.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An often neglected factor of cognition is the color, the natural and created hues that are an integral part of the image. This happens because the information is divided in two basically very different languages and they do not relate well to each other. One is for science that deals in facts and uses numbers to transmit the message. The other relates to intangibles and is used for the Arts. One speaks to reality and the other to emotions. Both are valid and correctly address the needs of their profession. Not very long ago the digital world was born. The computer came and put it all together in one little box, that probably sits on your desk and is accessible to most everyone else. Now there is the need to communicate and because the box only understands numbers, the artists have to shift and identify the colors by a simple numerical code. Conversely science can learn a lot from the arts, where for centuries painters have been studying the mysteries of creating light on canvas. While color needs a responsive set of eyes, it has always been here. Let us start now by using the same words to identify a particular color, its tint and shade, how the artists use it, the essential structure that lets one move from one color to another and the colorimetric measurements that make the exact color match a practical reality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes an improved error diffusion algorithm for the purpose of digitally halftoning images. In a first variation of the algorithm an error signal is calculated by the difference between a visually perceived input value and a visually perceived output value. This is accomplished by applying a causal visual blur function to both the input and output images. This approach has the advantage that it minimizes the appearance of worm artifacts in the output image while simultaneously eliminating the edge artifacts associated with a previous visual error diffusion algorithm. In a second variation of the improved error diffusion algorithm, a local image activity detector is used to adaptively modify the input and output blur filters. This allows the error diffusion parameters to be optimized for different types of image content.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A number of techniques for halftoning gray scale images has been proposed. Unfortunately a reliable methodology of comparing results still has to be developed. Our work is focused on analysis of edge information in halftoned images. In particular we are interested in the preservation of the original image edges and the appearance of edge artifacts created by the halftoning process. Our approach is a multiscale analysis based on a wavelet transform. The wavelet smoothing function is approximately a derivative of a Gaussian function. Image edges are found by identifying extrema points of the wavelet transform. The edge points of the same scale are connected into contours. Edge contours are linked into a pyramid structure across multiple scales. Evolution of wavelet maxima in this pyramid allows us to classify discontinuities in the image and to measure their significance. We studied performance of popular halftoning methods on images with various types of edges. Multiscale edge structures are identified in original gray scale and halftoned images. Corresponding values of the wavelet transform are compared. Our experiments show that proposed methdology can be used to measure fidelity of edge reproduction by the halftoning process. Also, various contouring artifacts can be reliably identified.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
It has been shown that incorporating a model of human visual system in the form of filtering halftone errors by the eye's Modulation Transfer Function improves the quality of generated halftones. However, in addition to the effect induced by the blurring of the eye, the human visual system constructs abstract, multi-scale representations of the images it perceives in the form of a coarse-fine representation where the coarser levels are augmented by increasingly finer levels of detail. Such abstract representations can be quite different from coarser representation generated by a linear Gaussian convolution filter which blurs the image while removing small features. In this paper we employ a nonlinear system of partial differential equations to construct a perceptually meaningful scale-space of coarse to fine representations of an image and require that the multi-scale structure of a halftone to approximate that of the original image. This is implemented via a scale-based error metric and optimization by simulate annealing. The improvements are demonstrated on a few images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Perceiving and Processing Natural and Synthetic Images
Effective computer graphic applications should accurately convey 3D shape. Previously, we investigated the contributions of shading and contour, specular highlights, and light source direction to 3D shape perception. Our experiments use displays of convex solid objects based on the superquadric parameterization, permitting continuous variation in their cross-sectional shapes. Our present work concerns the impact of surface markings. Rotating wireframe or uniformly shaded objects may produce perceptually distorting shapes. We investigate the idea that such distortions interfere with shape judgements, and that surface markings may either enhance perceptual accuracy by encouraging stability, or impair it by interfering with global shading patterns. Our displays include rotating objects with no surface markings, stripes, latitudinal or longitudinal stripes, each with two different scene illuminations. Observers view pairs of objects, a target shape and a second object whose shape they adjust, using mouse clicks, to match that of the target. Our principal result is that these surface patterns do not enhance performance, even though the chosen stripe intensities minimize interference with global shading, and the stripe patterns may actually encode surface curvature. We are now investigating alternatives for applying surface patterns to modelled objects, including hardware supported texture mapping. Our long term goal remains the identification of a comprehensive set of conditions for optimizing shape understanding of graphic objects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Large regions of many images are filled with visual texture, in which a viewer is not concerned with the exact pixel values. In image coding, it is advantageous to describe such regions in terms of their boundaries and textural properties. A textural description can be much more compact than a precise description of pixel values. For a coding system to work, it is necessary to have an automated method for generating compact texture descriptions; the synthesized textures must appear satisfying to the human viewer. We have adapted the Heeger and Bergen algorithms to the coding problem. The algorithm decomposes an image into subbands with a steerable pyramid, and characterizes the texture in terms of the subband histograms and the pixel histogram. Since the subband histograms all have a similar form, we can describe each one with a low-order parametric model. The resulting textural descriptor is quite compact. We show examples with both still images and video sequences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We examined visual search for color within the distributions of colors that characterize natural images, by using a foraging task designed to mimic the problem of finding a fruit among foliage. Color distributions were taken from spectroradiometric measurements of outdoor scenes and used to define the colors of a dense background of ellipses. Search times were measured for locating test colors presented as a superposed circular target. Reaction times varied from high values for target colors within the distribution (where they are limited by serial search based on form) to asymptotically low values for colors far removed from the distribution (where targets pop out). The variation in reaction time follows the distribution of background contrasts but is substantially broader. In further experiments we assessed the color organization underlying visual search, and how search is influenced by contrast adaptation to the colors of the background. Asymmetries between blue-yellow and red-green backgrounds suggest that search times do not depend on the separable L-M and S- (L+M) dimensions of early postreceptoral color vision. Prior adaptation facilitates search over adaptation to a uniform background, while adaptation to an inappropriate background impedes search. Contrast adaptation may therefore enhance the salience of novel stimuli by partially discounting the ambient background.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The electrophysiological measurements of image distortion are based on the model of object's detection in noise which describes quantitatively frequency-contrast characteristics in presence of noise and consist of the several stages including primary filtering, signal match filtering, comparison with threshold, decision, internal and memory noise. The tools for experiments include hardware and software for stimulation and data processing based on interconnection between two computers via RS 232C interface. Hardware for stimulation includes CCD camera, framegrabber, high resolution monitor and device for photometrical control and gamma-correction. Software includes programs for image processing and for test image generation. Hardware for evoked potentials processing includes multiple scalp electrodes, 16-channel amplifier, 12 bit A/D converter. Visual evoked potentials to calibrated natural test images or Gabor gratings from 0.45 to 14.4 cycles/deg with or without superposition of noise were studied. We measured the amplitude of (N1-P1), (N2-P2) components. The noise dispersion imitate the volume of image distortion. Evoked potentials recorded from different areas of the normal subjects cortex. The different reaction in occipital and parietal scalp area to spatial frequency of the test images was obtained. Changes in the form of responses were found when white noise was superimposed to the test images. The early components (N1-P1) of the evoked potentials are not depended from the spatial frequency of the test images with noise, but the late ones (P2-N2) are depended.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In a task such as face recognition, much of the important information may be contained in the high-order relationships among the image pixels. A number of face recognition algorithms employ principal component analysis (PCA), which is based on the second-order statistics of the image set, and does not address high-order statistical dependencies such as the relationships among three or more pixels. Independent component analysis (ICA) is a generalization of PCA which separates the high-order moments of the input in addition to the second-order moments. ICA was performed on a set of face images by an unsupervised learning algorithm derived from the principle of optimal information transfer through sigmoidal neurons. The algorithm maximizes the mutual information between the input and the output, which produces statistically independent outputs under certain conditions. ICA was performed on the face images under two different architectures. The first architecture provided a statistically independent basis set for the face images that can be viewed as a set of independent facial features. The second architecture provided a factorial code, in which the probability of any combination of features can be obtained from the product of their individual probabilities. Both ICA representations were superior to representations based on principal components analysis for recognizing faces across sessions and changes in expression.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The retrieval of images based on their visual similarity to an example image is an important and fascinating area of research. Here, we discuss various ways in which visual appearance may be characterized for determining both global and local similarity in images. One popular method involves the computation of global measures like moment invariants to characterize global similarity. Although this means that the image may be characterized using a few numbers, the performance is often poor. Techniques based on moment invariants often perform poorly. They require that the object be a binary shape without holes which is often not practical. In addition, moment invariants are sensitive to noise. Visual appearance is better represented using local features computed at multiple scales. Such local features may include the outputs of images filtered with Gaussian derivatives, differential invariants or geometric quantities like curvature and phase. Two images may be said to be similar if they have similar distributions of such features. Global similarity may, therefore, be deduced by comparing histograms of such features. This can be done rapidly. Histograms cannot be used to compute local similarity. Instead, the constraint that the spatial relationship between the features in the query be similar to the spatial relationship between the features of its matching counterparts in the database provides a means for computing local similarity. The methods presented here do not require prior segmentation of the database. In the case of local representation objects can be embedded in arbitrary backgrounds and both methods handle a range of size variations and viewpoint variations up to 20 or 25 degrees.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The measurement of perceptual similarities between textures is a difficult problem in applications such as image classification and image retrieval in large databases. Among the various texture analysis methods or models developed over the years, those based on a multi-scale multi- orientation paradigm seem to give more reliable results with respect to human visual judgement. This paper describes new texture features extracted from an overcomplete wavelet transform called a `steerable pyramid' which models human early vision. The textured image is decomposed into a 3- level pyramid using a 4-orientation band filter set, the texture features are computed from the distributions associated with each filter as follows: we construct the `cumulative distribution function' (cdf) of graylevels from the 12 band-pass images and we fit them with Bezier curves in order to characterize the texture. The clusters of the Bezier control points from the 12 cdf allow us to discriminate the textures. We apply these new texture features to the search through an image database to find the most `similar' textures to a selected one.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The advent of large image databases (> 10,000) has created a need for tools which can search and organize image automatically by their content. This paper presents a method for designing a hierarchical browsing environment which we call a similarity pyramid. The similarity pyramid groups similar images together while allowing users to view the database at varying levels of resolution. We show that the similarity pyramid is best constructed using agglomerative (bottom-up) clustering methods, and present a fast-sparse clustering method which dramatically reduces both memory and computation over conventional methods. We then present an objective measure of pyramid organization called dispersion, and we use it to show that our fast-sparse clustering method produces better similarity pyramids than top down approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we study how human observers judge image similarity. To do so, we have conducted two psychophysical scaling experiments and have compared the results to two algorithmic image similarity metrics. For these experiments, we selected a set of 97 digitized photographic images which represent a range of semantic categories, viewing distances, and colors. We then used the two perceptual and the two algorithmic methods to measure the similarity of each image to every other image in the data set, producing four similarity matrices. These matrices were analyzed using multidimensional scaling techniques to gain insight into the dimensions human observers use for judging image similarity, and how these dimensions differ from the results of algorithmic methods. This paper also describes and validates a new technique for collecting similarity judgments which can provide meaningful results with a factor of four fewer judgments, as compared with the paired comparisons method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We describe psychophysical experiments conducted to study PicHunter, a content-based image retrieval (CBIR) system. Experiment 1 studies the importance of using (a) semantic information, (2) memory of earlier input and (3) relative, rather than absolute, judgements of image similarity. The target testing paradigm is used in which a user must search for an image identical to a target. We find that the best performance comes from a version of PicHunter that uses only semantic cues, with memory and relative similarity judgements. Second best is use of both pictorial and semantic cues, with memory and relative similarity judgements. Most reports of CBIR systems provide only qualitative measures of performance based on how similar retrieved images are to a target. Experiment 2 puts PicHunter into this context with a more rigorous test. We first establish a baseline for our database by measuring the time required to find an image that is similar to a target when the images are presented in random order. Although PicHunter's performance is measurably better than this, the test is weak because even random presentation of images yields reasonably short search times. This casts doubt on the strength of results given in other reports where no baseline is established.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Perceiving and Processing Natural and Synthetic Images
We present a generalization of the Radon transform which fits many tasks in image processing, and is useful in modeling the human visual system. In an analogy with wavelets, we propose a transform localized at points, and translated over the image plane, and refer to this as a parallel sensor transform. We examine the relationship between this transform and wavelet transforms developed to describe visual system processes. The transform captures the image data in that it is injective. Using this starting point, we present a continuous analog of the four stage edge detection breakdown by Bezdek et al., and arrive at a framework for casting many kinds of image processing algorithms in a biologically plausible manner: as feed forward, receptive field based algorithms using known operations. We show how this leads to an optimization scheme for Radon transform based algorithms and show the results of applying this theory to biologically plausible algorithms for motion and color processing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Current digital image/video storage, transmission and display technologies use uniformly sampled images. On the other hand, the human retina has a nonuniform sampling density that decreases dramatically as the solid angle from the visual fixation axis increases. Therefore, there is sampling mismatch between the uniformly sampled digital images and the retina. This paper introduces Retinally Reconstructed Images (RRIs), a novel representation of digital images, that enables a resolution match with the human retina. To create an RRI, the size of the input image, the viewing distance and the fixation point should be known. In the RRI coding phase, we compute the `Retinal Codes', which consist of the retinal sampling locations onto which the input image projects, together with the retinal outputs at these locations. In the decoding phase, we use the backprojection of the Retinal Codes onto the input image grid as B-spline control coefficients, in order to construct a 3D B-spline surface with nonuniform resolution properties. An RRI is then created by mapping the B-spline surface onto a uniform grid, using triangulation. Transmitting or storing the `Retinal Codes' instead of the full resolution images enables up to two orders of magnitude data compression, depending on the resolution of the input image, the size of the input image and the viewing distance. The data reduction capability of Retinal Codes and RRI is promising for digital video storage and transmission applications. However, the computational burden can be substantial in the decoding phase.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Visual scanpath has been an important work applied in neuro- ophthalmic and psychological studies. This is because it has been working like a tool to validate some pathologies such as visual perception in color or black/white images; color blindness; etc. On the other hand, this tool has reached a big field of applications such as marketing. The scanpath over a specific picture, shows the observer interest in color, shapes, letter size, etc.; even tough the picture be among a group of images, this tool has demonstrated to be helpful to catch people interest over a specific advertisement.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Electronic Imaging Based on Retinal Processing and Eye Movements
We introduce an image coding algorithm in which perceptual pruning is used to select the most perceptually relevant image components. The algorithm uses a new maximum likelihood based image indistinguishability criterion that derives from a cortical snapshot, which is a model of the response of striate cortical simple cells to the image at a given point of fixation. The criterion must be satisfied at all points of fixation. We demonstrate that this method selects image components in a better way than matching pursuit and show image coding results of high subjective quality at low encoding rates.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.