Abstract
Purpose
The paper provides a theory base for deriving connotative descriptors for photographs from existing denotative descriptors, and then demonstrates a model for enhancing browsing within image collections by providing a tool for carving up the searching space.
Design/methodology/approach
The paper conceptually explores the nature of iconic messages contained in an image by adopting semiotics as a theoretical tool. A problem of image retrieval is identified as loss of connotative messages during the image representation process. The paper proposes an image‐retrieval model utilizing an association thesaurus that facilitates the assignment of connotative index terms by making use of denotative index terms of an image. A series of experiments are performed for evaluating the effectiveness of the proposed model.
Findings
Experimental results demonstrate that the association thesaurus improves image‐retrieval effectiveness by increasing the recall of connotatively related image documents as well as the recall of browsing sets.
Practical implications
Applying connotative index terms to an image would be time consuming. Deriving connotative terms from denotative terms and then using them to enrich the browsing environment suggest a method of increasing retrieval effectiveness while reducing the resources required for representation.
Originality/value
Since images are often used to illustrate concepts that are not immediately evident from just the objects in front of the lens, connotative descriptions are particularly valuable. Since human perception of images is, in a sense, hard wired into our brains, browsing is a frequent and reasonable search method in image collections. Using connotative descriptors to point the way to clusters of images with a higher probability of relevance changes the locus of control over representation establishes an environment for dynamic representation, and gives credibility to browsing as a significant search method.
Keywords
Citation
Yoon, J. and O'Connor, B. (2010), "Engineering an image‐browsing environment: re‐purposing existing denotative descriptors", Journal of Documentation, Vol. 66 No. 5, pp. 750-774. https://doi.org/10.1108/00220411011066826
Publisher
:Emerald Group Publishing Limited
Copyright © 2010, Emerald Group Publishing Limited
Introduction
Browsing is a significant method for seeking documents (O'Connor, 1993). This is especially the case when there is some form of discrepancy or indeterminacy between the system of representation and the individual seeker's representation of his or her need. Discrepancies and indeterminacies are especially likely to be problematic when words or low‐level image primitives are the primary representation mode for a collection of images. Words are not native elements of photographs and low‐level primitives, while native to photographs and often significant, may not rise to an appropriate conceptual level to provide a useful tool for individual users to re‐configure the search space.
We present here one means of aiding the browsing process by making novel use of existing verbal representations to cluster images. This approach engineers a means of reducing the search space by making use of pre‐existing representations. Our work sits at the intersection of image retrieval and semiotics. Since denotation and connotation are inextricably bound with the viewer/reader/interpreter and because there is no mediating tool for photographs of the sort that a dictionary provides for word documents, we are seeking ways to make use of multiple reactions to an image, not merely the single reaction of a single‐sanctioned cataloger. We are particularly concerned with those engaging in browsing, whether as a means to continue a search blocked by some failure of the representation system or as an initial means of engaging a collection of images. Browsing might be said to be a set of activities with individual focus combined with openness to serendipitous input – a purposefully sought serendipity. We seek to construct tools rather than rules.
Photographs
Photographs present a different form of representation from words. On the whole, they are made with the direct participation of some object or objects and are, thus, very specific. In this sense, they are exquisitely empirical. That is, they present to us the surface appearance of an object or set of objects at a particular moment in time. Instead of the representation “dog”, we are presented with a two‐dimensional projection of a particular terrier on the grass in the late afternoon; instead of “runner”, we are presented with a curly haired and bearded runner in blue shorts with his left foot crossing the finish line of the Boston Marathon in April of 1982; instead of “assassination” we are presented with the 343 frames (each a still photograph) of the Zapruder film showing John Kennedy's limousine in Dallas.
Since photographs tend to be much more specific than words and since words are not native elements of photographs, there is at least some difficulty in achieving verbal representations of photographs that will be useful for several sorts of searches by several sorts of users. We posit that a photograph may be sought because it presents a noun – that is, an object of the sort we wish to present was in front of the lens; it may be sought because it presents an instance of an event we wish to present; or it may sought because it contains an instance of an image primitive we wish to present – a particular color or texture or composition characteristic.
Representation of photographs as nouns is not especially difficult, though it is not so trivial as it might seem. Greisdorf and O'Connor (2002a) note that even among a fairly homogenous group, different people representing the same photographs chose different subsets of the image data, used different terms for the same object, and chose to represent at different levels of generality. A small thought exercise here may help to illustrate the different requirements and challenges in representation of photographs for use. Suppose we have a collection of some thousands of photographs with no verbal tags as representation, only small versions (thumbnails) of each of the pictures. Suppose, we present retrieval tasks to each of three groups of participants. We ask the first group to find seven photographs to illustrate a story on dogs in America. We ask the second group to find seven photographs to illustrate a story on the American West. We ask the third group to find seven photographs to illustrate a story on love in twenty‐first century America. We might hypothesize that the first group would finish most quickly and that there would be little variance in the seven photographs of dogs. This would be an example of a noun search. We might expect that the second group would take a bit longer and that there would be a greater variance in the seven‐image sets, because there is no single noun photograph of the American West. Cacti, pickup trucks, rodeo riders with iPods, housing development on ranches, four‐wheelers instead of horses, and numerous other nouns might suit the needs of illustrating a story on the American West. Then, it would not be hard to imagine that the third group would require more time to sort through images and that there would be even more variance in the picture groupings.
What now if an advertising designer wants a “bright red” campaign? What if an interior designer wishes to “tie together” several rooms subtly? What if adjectival concepts are the requirement of the photographs – delicate, robust, soft? In these instances, we might be looking for image primitives such as color, texture, edges, and lighting.
In some searches, it might be possible to make a list of nouns that might fulfill the requirements; however, there might be other nouns of which the searcher does not think. If it were possible to link other nouns that had satisfied searchers looking for photographs to illustrate the same abstract or adjectival concepts, we might give the browser a better chance of finding a satisfactory image or set of images.
Browsing
Browsing is not a matter of dumb luck and it is not simply the result of the invisible guiding hand of the librarian. Browsing is a set of activities used to locate and sample documents when other systems are inadequate to the search task. They represent a shift of the locus of representation from an external agency to an individual's internal agency. Of course, this introduces a substantial issue: browsing “success” is specific to a particular person and a particular set of circumstances. For, at least, some of the activities, we label as “browsing” there is no simple way of predicting what success would require. This means we do not have access to an easy metric of success.
Inherent visual features of an image, low‐level image primitives are often not sufficient to represent semantics embodied within an image message (Hare et al., 2007). In particular, connotative meanings are more difficult to represent with descriptive techniques based on image primitives (Eakins and Graham, 1999; Rui et al., 1999; Jörgensen, 2003). Even in cases in which image primitives are easily clustered to represent particular objects, it is not necessarily the case that the message will have meaning for any particular viewer. It is also obvious that several keywords assigned by an indexer may not represent an image adequately. Figure 1 shows that a large number of messages contained within an image, especially connotative messages, can result in discrepancies between indexers and users.
If the very specificity of images and their fundamental representational difference from words mean that browsing may at times be a useful search mechanism; and if we assume that browsing is a set of activities that makes use of multiple modalities for insertion into a collection and evaluation of discovered documents; then we can propose that a clustering mechanism derived from existing representations might reduce the search time and search space and increase the likelihood of satisfactory results. We must be clear here that we are not proposing that images are some sort of free‐floating triggers and we agree with one anonymous reviewer that: “images are created by humans operating within the logonomic parameters of their socio‐cultural moment.” It is precisely because of this that we are pursuing this line of research. A single cataloger sits within a set of logonomic parameters. Training, system design, and thoughtfulness of the cataloger and users of the catalog may not be sufficient to bridge differences in logonomic parameters of users; especially in the case where the objects being described – photographs – are not constructed of words.
If we look to a social networking image environment such as Flickr.com, we see a phenomenon that lies between a formal bibliographic apparatus (or even a quasi‐formal folksonomy) and utter chaos. That is, there are millions of images, some with multiple descriptors, some with no descriptors at all (except for the Exchangeable image file data automatically applied to many images made with recent digital cameras). There are no particular rules for how one should make up descriptors, especially since there are no rules governing the use of the environment. Some may simply use it as a handy album from which to select images for various projects or to show occasional friends and relatives some interesting images; others may hope to attract lots of attention to their images and so put numerous descriptors and add them to various “groups” – which act as a form of classification scheme or, at least, a set of local, and specific folksonomies. Millions of images on Flickr.com are tagged with “me.” For each photographer who uses this tag, it is especially useful, even though for anybody else searching the entire flickr domain, the millions of “me's” will be useless because they will not be of that “me.”
However, there are certain trends. If one searches the entire Flickr corpus for “dog,” one finds a bit over four million hits. Skimming through the first few hundreds, one finds family pets, fashion photos with dogs, hunting dogs, lyrical images, hotdogs (the food,) guard dogs, sled dogs, close ups of parts of dogs, and groups of dogs. If one searches Flickr for images tagged “love” about one million this come up. Here, the variety is greater than the “dog” search. Many of the images are of the photographer's beloved, hoped for beloved, and wishful thinking beloved. Others are a heart drawn in the sand, clasped hands, a champagne bottle, a peace symbol, wedding rings, a dog, a cruise ship, flowers, a baby, and even a shopping cart in a restroom. Someone searching for images of “love” might not have thought to look for images of a baby or a dog; now they might see this as useful.
We set about taking an existing set of images and descriptors to see if we could, indeed, use the ordinarily denotative descriptors of a standard cultural heritage collection of images and derive some measure of connotation from them. We must note that we are not saying there is any direct mapping of denotative descriptors to immediately and generally useful connotative descriptors. Instead, we are seeking to give browsers another navigation method that may lead to more successful searching with the expenditure of fewer resources. Browsing is still a matter of individual engagement with and evaluation of documents. It thrives on such individual engagement and evaluation. However, in a large collection, it may be possible and quite useful to make some rough predictions about areas that are more or less likely to yield useful results.
We are aware of what may seem to be a problem here. As one anonymous reviewer commented:
The danger with the author's approach is that semantic integrity will be forsaken in the interests of accommodating a wide variety of highly subjective terms which a set of viewers has seen fit to allocate [sic] to the depicted salient object or scene.
Our response to this comment is: “Exactly!” Browsing is a set of activities which acts as end runs around limitations of standard bibliographic procedures. In many browsing situations, random insertion into a collection might prove to be the most useful means of entry. However, many people do not have unlimited time for random insertion after random insertion. Also, it may be the case that someone else has blazed a useful path. So, by doing a small amount of clustering, we might be able to offer at least some of the benefits of browsing with a little more organization. People who purposely put themselves into browsing situations are generally aware that there is a tradeoff – a higher trash to gems ratio, but the possibility of finding the gem where little or no probability existed within the standard retrieval system. That is, by stepping outside the standard bibliographic apparatus, we increase the search space to include areas of the collection beyond those that have proved unfruitful (either in a given search or over a set of searches). While this may provide more potential, it may prove to be too large a search space; so, various heuristics for making the search smaller may be of value.
As shown in Figure 1, and as discussed by other researchers (Besser, 1990; Fidel, 1997; Greisdorf and O'Connor, 2002a, b; Jörgensen, 1998; Layne, 1994), connotative meaning[1] is an important but problematic attribute during the image representation process. Our study starts from the premise that connotative messages are important for improving user satisfaction of image‐retrieval results. Then, by applying semiotic notions of denotation and connotation in an instrumental manner, we explore the characteristics of messages within an image and the structures of interrelations among other types of messages. Based on the understanding of the nature of connotative messages within an image, we propose and evaluate a model that helps users find what they want, even when unfavorable information loss makes it difficult to obtain satisfactory search results.
We must clarify two points. We are using the notions of denotation and connotation in an instrumental fashion. These are more complex and more subtle than we have allowed here; we basically stripped away much of the subtlety to see if our model works at a general level. Also, an anonymous reviewer commented that our use of the terms “cataloger” and “real people” in Figure 1 showed a “pretty grim distinction”. On the face of it, we would agree; however, in the original design and publication of the research, the distinction was made as a playful statement of different needs and tools – the catalogers had smiled over it. Construction of descriptors in the formal bibliographic apparatus tends to focus on the nominal and are often in some order outside the norms of the local natural language.
Features of image retrieval
By understanding the unique features of images, it may be possible to identify which aspects should be primarily considered for designing an enhanced image‐browsing system. In this section, the features of image retrieval are reviewed in three aspects: indexing and query, image‐retrieval systems, and retrieval effectiveness.
Representations of image documents and user needs
Essential and unique features of image documents have been presented through research on image attributes. Panofsky (1962) suggested three levels of meanings of images: pre‐iconography, iconography, and iconology. Layne (1994) and Markey (1988) analyzed image attributes based on Panofsky's attribute levels. In addition, Hidderley et al. (1995), Krause (1988), and Jaimes and Chang (2000) examined the attributes of images. Although various categorizations have been made, it is common for researchers to distinguish between objective attributes (factual) and abstract attributes (symbolic, emotional, and subjective) in an image.
As research on image documents has focused on attributes analysis, most user studies of image retrieval are about the analysis of query types. One of the widely cited query classifications was provided by Eakins and Graham (1999). They divided queries into three levels: Level 1 primitive features, Level 2 logical features, and Level 3 abstract attributes. Enser and McGregor (1992), Armitage and Enser (1997), Hastings (1995), Chen (2000), Choi and Rasmussen (2003), and Markkula and Sormunen (1998) also analyzed user queries and showed that objective attributes are used more often than abstract attributes in queries. However, compared to the query‐analysis studies, other studies (O'Connor et al., 1999; Greisdorf and O'Connor, 2002a), examined users' verbal reactions to an image or sorting patterns, experimentally demonstrated the importance of affective and emotional attributes in image perception. Then, what caused this difference? According to a series of studies performed by Jörgensen (1995, 1998), when participants are given a template including both objective and abstract attributes, they tend to choose abstract terms to describe image more frequently than without the template. This result probably implies that users do not use abstract attributes in query formulation because they are not familiar with representing that kind of attribute.
Here, it is useful to turn to the binary document model suggested by Shannon and Weaver and mirrored in semiotics. Shannon and Weaver (1949) asserted the strict distinction between the message and the meaning. The message is the set of organized primitives the physical stimuli – in whatever medium; and the meaning is the result of an individual's engagement with the message – the individual interpretation. In this binary model by “interpretation”, we mean “function” or “suitability for a particular purpose.” That is, for some purposes, color or general composition rather than object photographed might be “meaning.”
Connotative descriptors for image‐retrieval systems
Although there has been research demonstrating the importance of connotative messages during the image‐search process, it has been recognized that connotative messages, depending on an individual indexer's subjectivity, are “out of scope” for text‐based indexing. Some commercial image‐retrieval systems attempt to describe emotional and affective attributes of an image, but they also show inconsistency and subjectivity issues (Jörgensen, 2003). In addition to text‐based approach, there are few content‐based systems which facilitate connotative access by involving emotional attributes. Kansei research is one of those few efforts. The Kansei‐based approach tries to index users' impressions and subjective feelings which are evoked while viewing images (Black et al., 2004). Kansei research has focused on finding a connection between low‐level features (i.e. color, shape, texture, and so on) and subjective impressions (Kato and Kurita, 1990; Tanaka et al., 1997; Bianchi‐Berthouze et al., 1999; Bianchi‐Berthouze, 2001). However, as Kansei research has progressed, some researchers addressed the insufficiency of low‐level dependent approaches and proposed to combine low‐level features and denotative/objective features for extracting emotional/abstract features (Kuroda and Hagiwara, 2002). In addition to Kansei research, Colombo et al. (1999) also proposed a model which derives emotional features by mapping objective and low‐level features into four representative emotional concepts. Recently, some researchers (Hare et al., 2006; Enser et al., 2007) introduced an idea demonstrating a possibility of ontological support for bridging the semantic gap between denotations and connotations of an image. Ontology‐based representation of images suffers from the very short‐coming browsers are often attempting to escape – a priori tagging. We might even suggest that browsers are attempting to escape the established ontological constructs. We might say that they are seeking an ontological commons where their needs and documents relevant to their needs both reside (Anderson, 2006).
Although these previous efforts for improving image retrieval through connotative attributes reported positive results, they seem to be partial solutions constrained by the experimental environment, such as a domain of collection, a limited number of connotation, and so on. For proposing a more generalized resolution for utilizing connotative messages during the image‐search process, understandings on the fundamental natures of connotative messages within an image should provide guidelines for designing and implementing image‐retrieval systems. This study adopted a theory of semiotics as a basis for understanding connotative messages of an image.
Image‐retrieval system effectiveness
Indexing is good if and only if it results in good retrieval. However, it is difficult to objectively measure the performance of image retrieval, because due to various subjective meanings of an image, a relevant set of images can be different depending on users. In spite of these difficulties, the most widely used measurements for estimating image‐retrieval system effectiveness are precision and recall. However, since humans can browse more images with less effort in a shorter time compared to text documents, some researchers insist that in the case of image retrieval emphasis on recall is reasonable. Layne (1994) explains that since recall is more important than precision, concentrating on browsing with basic index terms would be better than making efforts to assign detailed index terms. Cox et al. (1996) stressed the importance of recall by asserting that a critical problem of image retrieval is non‐retrieved relevant images rather than retrieved non‐relevant images.
Furthermore, Cox et al. (2000) confirmed the effectiveness of displaying diverse images by comparing two display modes, “most‐probable display update scheme” and “most‐informative display update scheme.” The most‐probable display scheme, which is adopted by most if not all image‐retrieval systems, displays the best possible images after each query and relevance feedback. Whereas, the most‐informative display scheme exhibits images in order to gain the highest information from users, by displaying a large varied set of images at the beginning of search stage. They demonstrated that by deriving as much information from the users as possible during the searching process, it is ultimately possible to finish the search iteration more quickly. Zhou and Huang (2003) suggested that these two schemes can be mixed in one screen if the balance between the two can be determined optimally.
Fidel (1997) introduced the concept of the recall of browsing sets. That is, indexing should be done to create browsing sets rather than differentiating relevant images from non‐relevant images. She explained that for a user looking for a picture of Paris for a travel brochure, a large diversity of pictures of Paris would be more informative rather than 50 images of a bird's‐eye view of Paris or of the Eiffel Tower. Since, even with the same query, the relevant images can be entirely different, recall of browsing sets could be a more effective measurement than general recall. As can be noticed from this line of studies, the researchers' common view is that in an image‐retrieval system recall is more important than precision and effective browsing would be more critical than searching only the small number of relevant images.
Formal problem model
The success of the search depends on the relationship between the original document and a user's need. However, search results are determined by the matching of the index terms (document representations) and the queries (representations of user needs), and during the both representation processes, some information is left behind or left out.
Information loss occurred at document representation can be explained with semiotics point of view. According to semiotics, connotative messages are derived from denotative messages within socio‐cultural contexts (Bathes, 1964/1968, 1964/1977). Therefore, a producer of a sign who wants to convey connotative messages through denotative messages generates the sign within his/her historical, social, and cultural context and may assume that viewers would read the sign from the same socio‐cultural code. However, since not everyone shares the same code, a sign may have several alternative meanings and not every viewer can be the intended viewer. In addition, those meanings which are not intended by the producer also can communicative and meaningful sign to the viewers posted in different socio‐cultural contexts.
In the image‐retrieval process, users access the sign through the image document representation generated by indexers who also has a specific socio‐cultural background. Since an indexer is located between a sign and a user, the degree of overlapping between the user's code and the indexer's code influences search results. In other words, connotative index terms given by indexers may not match receivers' interpretations, and as a result, receivers may have a low probability of successful retrieval. In addition, it is impossible to expect an indexer to produce all possible connotative index terms, because an indexer can produce index terms only based on his/her socio‐cultural background or other socio‐cultural settings that have been studied. This tendency has supported an idea of some researchers who insisted not to assign connotative index terms because of their subjectivity and inconsistency (Hourihane, 1989; Markey, 1984). In contrast to the connotative messages, the seeming objectiveness of denotative messages makes it easier to assign denotative index terms, and, hence, the information loss in denotative indexing might be regarded as relatively moderate information loss.
Information loss from the user's perspectives is related with the difficulty in representing connotative needs as part of the retrieval process. Users can have connotative needs whether they appropriately represent those needs as system queries or not. As an experimental study concerning representation of connotative needs, Yoon (2006) asked participants to find an image expressing peaceful resolution of international conflict. By analyzing 33 unique search terms provided by 26 participants, she demonstrated that in some cases, users expressed their connotative needs using words of connotative concepts, such as, “peace,” “cooperation,” “harmony,” “calm,” “sad,” etc. In other cases, users employed denotative query terms which can return images that may include relevant connotative messages, such as, “dove,” “pigeon,” “olive branch,” “globe,” “child,” “people,” “tear,” “destructed building,” etc. In each case, either adopting connotative queries or denotative queries, there are possibilities which prevent users from obtaining satisfactory search results. In the first case, users employ a variety of connotative query terms. However, as discussed, an indexer, occupying a specific code, may not generate those diverse connotative index terms. As a result, the query term may not match an index term, or there could be very low probabilities for matching. Concerning the second case, most images would be returned based on the matching between index terms and denotative queries. However, a user's real need is related with a connotative message rather than simply a denotative message, and the actual connotative need can be satisfied by other denotative messages which are different from the query. For instance, a user who types “dove” as a search term for finding images expressing “peaceful resolution” may also be interested in the images of a smiling child, the images of multi‐cultural people, and even the images of war or destructed buildings, which advocates anti‐war. From the perspective of user needs, search results which include only “doves” are part of his/her potentially useful results. Often, users have denotative needs; however, even in this case, connotative messages may have an effect on selecting the most satisfactory image among the multiple images which include the specific denotative messages.
Based on the conceptual analyzes on information loss, two assumptions are used in this study for making a complex searching situation simpler. First, it is considered that image index terms are mostly denotative messages, not connotative. Second, if index terms do not represent connotative messages, a connotative query will yield no results. So, at this time, only the denotative query is considered for the proposed image retrieval. In this context, this study focuses on the following question; how to make use of denotative messages for accessing connotative messages during the image‐retrieval process?
Synthesis for resolution
Consider that a denotative query is given. Since a denotative query may be generated from various connotative needs depending on the user's socio‐cultural backgrounds, the relationship between the query and the needs can be shown as in Figure 2; |D| is a relevant set for a given denotative query, and |C1|, |C2|, |C3| or |C4| are potential connotative needs which are related to that denotative query. The (a) images may match both denotative query and connotative need; and the (b) images do not match the denotative query, but they have a possibility of satisfying user connotative needs.
Figure 3 shows an example of Figure 2. The images with “olive branch” are the relevant set for the denotative query “olive branch” and the other images without “olive branch” are connotatively related images. Here is a conceptual scenario exemplifying how this model can be applied to the image‐search process. If a user submits a query “olive branch,” the image‐retrieval system displays images of olive branch (|D|) categorized by connotative meanings, scene of old town (a1), Catholic Christian symbol (a2), and symbol of peace (a3). If the user is specifically interested in a group of images symbolizing peace (a3), he/she can expand the group including images symbolizing peace without olive branch (b3). Then, how can this conceptual searching scenario be implemented?
We propose a model for improving the recall of browsing sets by displaying not only sets of denotatively related images but also sets of connotatively related images. As shown in Figures 2 and 3, when the images that match a denotative query are returned, the search results would be grouped by connotative messages which can be derived from denotative messages ((a)s); then, based on user selection, the search results would be expanded to the images which do not match the denotative query but do relate connotatively ((b)s). This study refers to these two functions as grouping and expanding functions, respectively, and proposes that these functions can be implemented by developing an association thesaurus. The association thesaurus is different from a traditional thesaurus which narrows or broadens the search results through vocabulary control, such as narrower, broader and related terms. The purpose of this association thesaurus is to identify, at an utterly instrumental level for now, associations between connotations and denotations in accordance with Barthes' assertion that connotation can be inferred from denotative messages. A detailed association thesaurus construction procedure will be discussed in the following section.
Research design
We must be very explicit here in order to avoid seeming to assume that connotative indexing terms can be derived from denotative terms simply on the basis of lexical co‐occurrence. As an anonymous reviewer of an earlier draft correctly noted: “This would be a simplistic assumption even with textual information objects; with visually encoded information objects the assumption is naïve, a convenience which does not reflect reality.”
Our assumption is that there is a possibility that simply using lexical co‐occurrence might provide one useful tool for carving down the search space. We agree that lexical co‐occurrence is not in and of itself the strongest way to derive connotative terms from denotative terms. We strongly agree with the implication that images are not easily represented with words and that there is no simple algorithmic relationship between images and words. We were attempting to determine if using verbal representations already in place or already generated under other circumstances could possibly be useful to image searchers. As social networking photography sites proliferate, we may see stronger possibilities for lexical co‐occurrence. As with any browsing activity, there is a certain element of random insertion and an understanding that the “trash to gems” ratio will likely be high, but if we can reduce the size of the trash, the trash to gems ratio might be more favorable.
In the previous section, the preliminary model utilizing an association thesaurus was established based on the semiotics analysis that a connotation of an image can be inferred from denotative messages of the image. We examined whether this model can improve effectiveness of image retrieval and browsing by examining the following hypotheses:
H1. A connotation derived from denotation will retrieve additional relevant images and increase relative recall of image documents.
H2. A connotation derived from denotation will retrieve additional relevant browsing sets and increase relative recall of browsing sets.
Data source
The “Artefacts Canada: Humanities” (http://daryl.chin.gc.ca/Artefacts/e_MasterLayout.cgi) and the “Art & Architecture Thesaurus” (AAT, www.getty.edu/research/conducting_research/vocabularies/aat/) were selected as our data sources. Artefacts Canada provided by the Canadian Heritage Information Network (CHIN) is composed of approximately 2.5 million records gathered from contributing museums and galleries (from now on, CHIN refers to “Artefacts Canada: Humanities” by CHIN). Their main subject areas are archaeology, decorative arts, fine arts, ethnology, and history. CHIN has integrated the AAT into its search interface, so that users can search or browse the AAT terms and the system presents the results based on matching between selected AAT terms and bibliographic information of records.
The AAT is maintained by the Getty Vocabulary Program covering art, architecture, and material culture areas. It contains approximately 125,000 terms with facet structure. From its seven facets, we considered the “associated concepts” facet as connotation, because it includes terms for abstract concepts, theories, ideologies, and so on. For denotations, “agents”, “activities”, and “objects” facets are selected, because each of them contains terms for role, occupation, and persons, terms for actions or tasks, and terms for entities, respectively. On the other hand, the terms for quality or design elements (physical attributes facet), the terms for produced substances (materials facet), or the terms for artistic classification (styles and periods) are not regarded as either connotations or denotations.
For this study, 5,199 records were collected from the entire Artefacts Canada collection, according to two criteria:
- 1.
Record types should be images or photographs.
- 2.
Since this study examines the relation between connotation and denotation, records should include terms from the associated concepts facet as well as agents, activities, or objects facets in subject‐related fields.
The dataset was processed with two procedures. First, French terms are translated into English terms; since entries of the CHIN database are contributed by various Canadian organizations, some records are written in French. Using a Free Text Translator (www.freetranslation.com), French terms are translated into English. Second, the Porter's (1980) suffix‐stripping procedure is applied to the dataset for stemming purposes. Porter's stemming procedure is widely used in information‐retrieval applications to convert term variations into their common linguistic roots. The Natural Language Toolkit (http://nltk.sourceforge.net/r) for Python is used for the stemming procedure. The processed 5,199 records are divided into two sets, which are used as a training set and a testing set, respectively, with a ratio of 3:1. The training set is composed of 3,900 records and the testing set is 1,299 records.
Association thesaurus construction
For identifying the relation between a denotation and a connotation, this study developed an association thesaurus based on term association principle. Term association, which mainly depends on the degree of term co‐occurrences in a text or text collection, has been used in the text‐retrieval area as an automatic term classification technique. The term association assumes that:
[…] if two or more terms co‐occur in many documents of a given collection, […] these terms are related in some sense and hence can be included in common term classes (Salton and McGill, 1983, p. 228).
Since this assumption has been verified in prior experimental research, it is plausible to apply the same assumption to this study, that is, “if a denotation and a connotation co‐occur in many images, these two terms are related in some sense, so that they can be included in a common class.” A degree of term association between denotation and connotation is computed using five normalized term association measures, Cosine, Dice, Jaccard, Pearson, and Yule's Y, which are widely used in the information‐retrieval field (Chung and Lee, 2001; Kim and Choi, 1999; Rasmussen, 1992). It is not the purpose of this study to compare the effectiveness of these five association measures; however, since there is little consensus as to what types of measures are most generally applicable (Chung and Lee, 2001), five measures are used and compared to decide which measure would be appropriate for this specific dataset.
From the training set, 148 connotations and 446 denotations are extracted in accordance with AAT fields, and the degree of association between those connotations and denotations were measured with five association measures. Among 66,008 connotative‐denotative term pairs generated from 148 connotations and 446 denotations, 3,323 term pairs (5.034 percent) showed positive term associations. When comparing the five measures, the Cosine and Pearson measures and the Dice and Jaccard measures demonstrated high‐agreement ratios, respectively, so three different thesauri were developed using three measures Cosine, Dice, and Yule's Y. Each thesaurus contained approximately 660 (1 percent) of the top‐ranked connotative‐denotative term pairs with their association measures.
Procedure of indexing the testing set with association thesauri
Then, the testing set was indexed with the association thesauri constructed from the training set. In other words, by referring the relation between connotation and denotation term pairs indicated in three association thesauri, images of the testing set are assigned “Associated connotations.” In order to assign Associated connotations to an image in the testing set, the record should contain denotative terms which are included in an association thesaurus. However, it was found that of the 1,299 records of the testing set, only 680 records (52.35 percent of the testing set) included at least one denotation found in the 446 denotations which were extracted in the training set. This implies that only half of the records in the testing set have denotations used in constructing the association thesauri. This low ratio might be attributed to the limited number of the training dataset. A more comprehensive term list covering various terms is suggested for future research. In this study, the 680 records were used for the effectiveness evaluation.
The indexing algorithm was designed to assign every possible Associated connotation with its weight (association value). The algorithm for indexing the testing set with the association thesauri was adjusted from the research of Plaunt and Norgard (1998). First of all, the 680 records from the testing set were examined to identify the denotations included in the association thesaurus entries. When a denotation was found by the association thesaurus, the record was indexed with the Associated connotations and association values. For example, a record in the testing set has two denotations, “Automobile” and “Museum,” which are entries in the association thesaurus. According to the association thesaurus developed by the Cosine measure, those denotations have the following associations with the connotations: “Automobile – Antique – 0.404,” “Museum – Antique – 0.229,” and “Museum – Golden Age – 0.204.” Based on this association information, the record was indexed with two ranked connotations:
- 1.
Antique (0.633 – this value is obtained by combining 0.404 and 0.229).
- 2.
Golden Age (0.204). The thesauri using Dice and Yule's Y‐measures were also applied to the dataset in the same way.
Measurement: relative recall
Relative recall is a useful measurement in comparing different retrieval approaches (Tague‐Sutcliffe, 1992). In this study, relative recall for an association thesaurus is computed for investigating to what extent the association thesauri can enhance the accessibility of connotative messages even when the collection is not indexed with connotations. That is, the relative recall for an association thesaurus is compared with the relative recall for AAT connotations, which are originally assigned to the CHIN collection (from now on, Associated connotations refer to the connotations which are assigned through an association thesaurus and AAT connotations refer to the connotations which are originally assigned to the CHIN collection).
Relative recall measures the ratio between the number of relevant retrieved documents for a treatment and the total number of unique relevant documents retrieved for all treatments (Greenberg, 2001). In this study, the relevant documents are determined based on two treatments: AAT connotations and Associated connotations. In the case of the AAT connotations, all of them are regarded as relevant connotations for the assigned image documents, because each AAT connotation is given by museums or related organizations which contribute to the CHIN database. However, in the case of the Associated connotations, the relevance of connotations to the assigned image documents is assessed by two judges who have some expert knowledge in image representation. The judges are asked to evaluate the relevance of Associated connotations by reviewing the image itself and its bibliographic record. Relevance evaluations involve a three‐tier scale of “Relevant,” “Partially relevant” and “Not‐relevant.” “Partially relevant” is regarded as “Relevant” during data analysis similar to prior documented research (Greenberg, 2001; Saracevic and Kantor, 1988). This study considers the possibility that even judges can interpret connotative messages of an image differently based on their own coding systems. Since a connotative meaning can be relevant for one viewer but not for the other depending on a specific socio‐cultural context, the Associated connotations, which are judged as “relevant” or “partially relevant” by one judge, but “not‐relevant” by another judge, are regarded as “relevant” during data analysis.
The relative recall is calculated in two aspects: relative recall of retrieved image documents and relative recall of browsing sets, which are designed to support H1 and H2, respectively. First, relative recall of retrieved image documents is computed by examining whether an association thesaurus can retrieve more image documents which are connotatively relevant. Relative recall of retrieved image documents relates to expanding function; that is, high‐relative recall of retrieved image documents will increase the number of images in (b)s from Figure 2. In other words, once a user selects a group of images (one of (a)s) which are connotatively related, expanding function pertains to the number of images which are connotatively related but having different denotations. This evaluation is performed by comparing the relative recall for Associated connotations with relative recall for AAT connotations against evaluative connotative terms. Here, is an example which considers “anger” as an evaluative connotative term. With the AAT connotations, two image documents, I1 and I2, are found, and with the Associated connotations, four image documents, I2, I3, I4, and I5 are found. I1 and I2 are relevant image documents, because “anger” is assigned to those documents by indexers. The relevance of I3, I4, and I5 are judged by two judges, and the judges assess I3 and I4 as relevant and I5 as not‐relevant. Then, the total number of unique relevant documents is four: I1, I2, I3, and I4. The relative recall for association thesaurus is 0.75 (three‐fourth) and the relative recall for AAT connotations is 0.5 (two‐fourth). That is, the relative recall for the association thesaurus is calculated as follows: Equation 1 Second, relative recall of browsing sets is measured. In this study, the browsing sets are defined as connotatively related groups of image documents which are retrieved against a denotative query. This evaluation process explores to what extent the association thesaurus can offer connotatively related groups of images; in other words, it is related to grouping function. The browsing sets which are retrieved through an association thesaurus are compared with the browsing sets which are obtained through AAT connotations. For example, there is a denotative query “mountain,” and it returns three image documents (The three documents correspond to |D| in Figure 2). If two AAT connotations, Chaos and Paradise, are found from these three images, the number of browsing sets expanded through AAT connotations is two (|C1| and |C2| in Figure 2); users may want to expand their search, either chaos or paradise. On the other hand, if four Associated connotations, inspiration, paradise, romanticism, and totemism, are found using an association thesaurus, those four connotations are defined as browsing sets (|C1|, |C2|, |C3| and |C4| in Figure 2). Then, in order to judge the relevance of the browsing sets, the judgment result of the three images against four Associated connotations are examined. The Associated connotations which are judged as “not‐relevant” in all three images are regarded as a not‐relevant connotation group and the Associated connotations which are judged as “relevant” or “partially relevant” in at least one image are regarded as a relevant connotation group. In this example, among four Associated connotations, Totemism is judged as not‐relevant in all three images. Therefore, the number of relevant browsing sets is three, and the total number of the unique relevant browsing sets is four: chaos, paradise, inspiration, and romanticism. The relative recall for the association thesaurus is 0.75 (three‐fourth) and the relative recall for AAT connotation is 0.5 (two‐fourth). The relative recall of browsing sets for the association thesaurus is calculated as follows: Equation 2
Data analysis
Before performing an experiment to support two hypotheses, the degree of mapping between AAT connotations and Associated connotations was examined. From the first experiment, the best‐performing association measures for the construction of an association thesaurus as well as the degree of correspondence of Associated connotations to AAT connotations were examined. Then, the second experiment investigated whether an association thesaurus could have a positive impact on image retrieval and browsing by measuring the relative recall of image documents and browsing sets.
Mapping results between AAT connotations and Associated connotations
The degree of mapping between Associated connotations and AAT connotations was examined. In this evaluation process, AAT connotations were used as a standard relevant connotative term. Therefore, these results will be somewhat conservative, because although AAT connotations are assigned by professional human indexers, AAT connotations should not be regarded as a comprehensive source of relevant index terms. In fact, there might exist Associated connotations which do not match AAT connotations, but are considered relevant connotations by some users. However, in spite of its conservativeness, comparing index terms, which are generated by a new approach with the index terms generated by human indexers, is a method that has been widely used in the evaluation of new information‐retrieval approaches. In a sense, this study considers that the degree of mapping between AAT connotations and Associated connotations can present a feature of the effectiveness of the semiotic‐based approach. By measuring the degree of mapping between two types of connotations, the best‐performing association measures, which produce high recall and precision, are also identified.
Table I shows the overall description of AAT connotations and Associated connotations assigned to 680 records through the association thesauri. The first column indicates the source of connotations. The second column, “No. of indexed record,” shows the number of records containing indexed connotative terms through “source of connotation,” and the third column, “No. of connotations in the records,” gives the total number of AAT connotations or Associated connotations which were assigned to the records. “Avg. no. of connotations per record” presents the average number of connotations per record and “No. of unique connotation” indicates a total number of unique connotations appearing in the 680 records. While the average number of AAT connotations per record is 1.34, the average number of Associated connotations per record is from 3.25 to 5.75 depending on which association thesauri is used.
Table II demonstrates the evaluation results of three thesauri. The first column contains the association measure used in constructing the association thesauri. The second column indicates the maximum number of Associated connotations per record, i.e. the depth of indexing. The depth of indexing was designed to reflect the ranks of association weights. The third and fourth columns show the total number of assigned Associated connotations and the average number of Associated connotations per record, respectively. The table presents the number of Associated connotations matching AAT connotations followed by recall and precision scores.
These results indicate that the Cosine, Dice, and Yule's Y have similar patterns, but on the whole, the Cosine measure shows the best results of the three measures. The number of matches (the fifth column) shows that as the depth of retrieval increases from one to five, the number of matched connotations increases proportionally, but beyond that point it increases slowly. As the depth of indexing increases, recall increases and precision decreases, as expected (refer to Figure 4). However, the recall and precision scores do not seem satisfactory. One possible reason for relatively low recall and precision scores might be the constraints of this study – only a small number of denotations and connotations were used in constructing the thesauri and indexing the dataset. Another issue affecting the low recall and precision might be the conservative nature of the evaluation. Although the association thesauri assign relevant connotations, most of those Associated connotations could be estimated as non‐relevant because they do not match AAT connotations. The relevance judgment results on Associated connotations will be reported in the next section. In spite of these limitations, the results show that by applying association thesauri to a collection having only denotative index terms, it is possible to assign 27 percent of all relevant connotations (among 909 AAT connotations, 249 terms were able to be assigned through Cosign association thesaurus). With considerations on low‐term agreement across indexers –for instance, Markey (1984) demonstrated 13 percent overlap among three image indexers' index terms– 27 percent overlap between AAT connotations and Associated connotations implies a great potential.
Relative recall of image documents and browsing sets
This section reports the results of the second experiment designed to support two hypotheses. Whereas, the first experiment defined the relevant set as AAT connotations, the second experiment expanded the relevant sets by reflecting relevance judgments on Associated connotations. Then, by measuring relative recall, the second experiment focuses onto what extent the Associated connotations can retrieve additional image documents as well as browsing sets. Based on the previous analysis, the association thesaurus developed with the Cosine measure was used in measuring relative recalls. The testing set for the second experiment was limited to the records containing the images, because the judges were asked to review both the image itself and its bibliographic description. (N.B.: Only small portions of the CHIN collection provide both the image and a bibliographic description. Most records contain only bibliographic descriptions.) As a result, 58 records were selected from the 680 records for the second experiment. The results of the two tests supporting H1 and H2 are discussed next.
First, retrieval of additional connotatively relevant images by applying an association thesaurus was tested. This evaluation was performed by comparing relative recalls and precisions. The relative recall and precision of image documents which are indexed with the association thesaurus were compared with those of the AAT connotations. As shown in Table III, the number of AAT connotations assigned to 58 records was 67, and these 67 connotations were all regarded as relevant connotations. The total number of Associated connotations assigned to 58 records was 346, and among the 346 connotations, 278 connotations were judged as relevant; that is, approximately 80 percent of Associated connotations were judged as relevant. The number of unique relevant connotations, from AAT connotations or from Associated connotations, was 337.
Table IV indicates the t‐test results of relative recalls and precisions of retrieved image documents. It shows significant differences between the relative recall for AAT connotations (0.24) and the relative recall for the association thesaurus (0.79) at the 0.05 level (paired samples t‐test, p‐value=0.000). It was also found that the association thesaurus can cause a decline in precision without statistical significance (paired samples t‐test, p‐value=0.38). In spite of a decline in precision, this result supports the H1 concentrating on recall, which is, a connotation derived from denotation retrieves additional relevant images and increases relative recall of image documents. From the result, it can be concluded that the association thesaurus can provide a larger number of relevant connotations than connotations assigned by human indexers.
Next, the relative recalls and precisions of browsing sets were computed. This study defined browsing sets as connotative groups which are retrieved against a denotative query. The browsing sets retrieved and expanded through the association thesaurus were compared with the browsing sets expanded through AAT connotations. The 57 denotative queries extracted from the 58 records were used for measuring the relative recall of browsing sets. Table V presents the number of browsing sets expanded through AAT connotations and the association thesaurus. The average number of relevant browsing sets obtained through the association thesaurus was 2.26. This implies that even when the image collection is indexed with denotative terms, it is possible to display approximately two browsing sets (connotative groups) by utilizing the association. Compared to the browsing sets from the association thesaurus, it was found that AAT connotations generate, on average, 1.51 browsing sets.
Table VI indicates the t‐test results of relative recalls and precisions of browsing sets. It shows significant differences between the relative recall of browsing sets expanded from AAT connotations (0.41) and those expanded from the association thesaurus (0.62) at the 0.05 level (paired samples t‐test, p‐value=0.000). However, it was also discovered that the association thesaurus can cause a statistically significant decline in precision (paired samples t‐test, p‐value=0.000). In spite of a significant decline in precision, this result supported H2, which is a connotation derived from denotation retrieves additional relevant browsing sets and increases relative recall of browsing sets. The results of the experiments demonstrated that the association thesaurus which was developed based on associations between connotation and denotation can facilitate assigning connotative terms to image documents indexed with only denotative terms. Especially, it was an encouraging result that 80 percent of connotative index terms assigned through the association thesaurus were judged as relevant. In addition, it was also found that the association thesaurus improves the recall of retrieved image documents as well as browsing sets compared to the connotations given by human indexers. From this analysis, it can be concluded that the association thesaurus can facilitate availability of connotative messages.
Discussion and implications
We might say that connotation is closely linked to meaning and function of photographs. Denotation speaks to what was in front of the lens, whereas connotation speaks to what one can do with the message. It is likely that connotations will be different for nearly every viewer, so expecting one person to generate connotative descriptors useful for all users is not reasonable. Our research agenda is to seek ways to make available the connotative descriptors of more than one person and, to some degree, make predictions that the connotative descriptors generated by some subset will be useful to some particular user. This study followed the lead of prior research which discussed loss of connotative messages during the image representation process as one of main problems of image retrieval. The analysis results of two judges' relevance assessments revealed that there are several connotative messages embedded in an image in addition to the connotative index terms assigned by indexers (AAT connotations). Therefore, the challenge for an image‐retrieval system designer is to build a system that supports the availability of connotative messages of an image in order to enhance user satisfaction. Our results suggest that it is possible to assign connotative index terms, even when the image collections are indexed with denotative terms, by applying an association thesaurus. Since the main purpose of this study was to examine whether connotations which are derived from denotations can improve retrieval effectiveness, this study focused on investigating this fundamental principle rather than developing an effective image‐retrieval system embedding current image‐retrieval technologies.
Since we now have evidence that instrumental notions of denotative and connotative descriptors can be used to engineer a component of an image‐browsing environment, we expect to explore the potential of semiotic notions in greater depth and greater attention to Barthes' almost playful considerations. In 1859, Oliver Wendell Holmes wrote about a possible future of photographs. In his article on stereo photography:
Form is henceforth divorced from matter. Give us a few negatives of a thing worth seeing, taken from different points of view, and that is all we want of it. Pull it down or burn it up, if you please. There is only one Coliseum or Pantheon; but how many millions of potential negatives have they shed,–representatives of billions of pictures,–since they were erected! Matter in large masses must always be fixed and dear; form is cheap and transportable. Every conceivable object of Nature and Art will soon scale off its surface for us. Men will hunt all curious, beautiful, grand objects, as they hunt the cattle in South America, for their skins and leave the carcasses as of little worth. The consequence of this will soon be such an enormous collection of forms that they will have to be classified and arranged in vast libraries, as books are now.
In order to make more full use of all those photographs, we may need to set aside some nineteenth century assumptions about document seeking and about the relationship of words and photographs. One possible approach is to examine the power and delights of browsing and devise other mechanisms to utilize connotative messages that improve image retrieval. As one anonymous reviewer suggests, using the language of semiotics: “it might be possible to further analyze the tags used as, for example, indexical tags, iconic tags, symbolic tags.” Then we might go on to consider the very semiosis of browsing and to the notion of making signs for the unknown (describing questions) and deeper consideration of the relationship between nominal descriptions and adjectival needs.
Notes
In general, denotation is regarded as the definitional, literal, obvious or commonsense meaning of a sign; whereas connotation is a meaning which is more influenced by socio‐cultural aspect (Chandler, 1999). Bathes (1964/1968, 1964/1977) explained that these two concepts are distinguished by the reliance of connotation on denotation. In other words, he conceptually demonstrated that connotative messages are derived from denotative messages of an image with in socio‐cultural context. Although there are other approaches which consider the relationship between connotation and denotation, this paper uses the “denotative message” and “connotative messages” in accordance with Barthes' approach.
Appendix
Corresponding author
JungWon Yoon can be contacted at: [email protected]
References
Anderson, R.L. (2006), “Functional ontology construction: a pragmatic approach to addressing problems concerning the individual and the informing environment”, doctoral dissertation, University of North Texas, Denton, TX.
Armitage, L.H. and Enser, P.G.B. (1997), “Analysis of user need in image archives”, Journal of Information Science, Vol. 23 No. 4, pp. 287‐99.
Bathes, R. (1968), Elements of Semiology, Hill and Wang, New York, NY (translated by A. Lavers and C. Smith).
Bathes, R. (1977), Rhetoric of the Image, Fontana, London, pp. 32‐51 (edited and translated by S. Heath).
Besser, H. (1990), “Visual access to visual images: the UC Berkeley image database project”, Library Trends, Vol. 38 No. 4, pp. 787‐98.
Bianchi‐Berthouze, N. (2001), “Kansei‐mining: identifying visual impressions as patterns in images”, Proceedings of the Joint 9th IFSA World Congress and 20th NAFIPS International Conference, Vancouver, Canada, Vol. 4, pp. 2183‐8.
Bianchi‐Berthouze, N., Berthouze, L. and Kato, T. (1999), “A visual interactive environment for image retrieval by subjective parameters”, Proceedings of IEEE Third Workshop on Multimedia Signal Processing, Copenhagen, pp. 559‐64.
Black, J.A., Kahol, K., Tripathi, P., Kuchi, P. and Panchanathan, S. (2004), “Indexing natural images for retrieval based on kansei factors”, Human Vision and Electronic Imaging IX: Proceedings of the SPIE, San Jose, CA, Vol. 5292, pp. 353‐75.
Chandler, D. (1999), “Semiotics for beginners”, available at: www.aber.ac.uk/media/Documents/S4B/semiotic.html (accessed October 7, 2002).
Chen, H. (2000), “An analysis of image queries in the field of art history”, Journal of the American Society for Information Science and Technology, Vol. 52 No. 3, pp. 260‐73.
Choi, Y. and Rasmussen, E. (2003), “Searching for images: the analysis of users' queries for image retrieval in American history”, Journal of American Society for Information Science and Technology, Vol. 54 No. 6, pp. 497‐510.
Chung, Y.M. and Lee, J.Y. (2001), “A corpus‐based approach to comparative evaluation of statistical term association measures”, Journal of the American Society for Information Science and Technology, Vol. 52 No. 4, pp. 283‐96.
Colombo, C., Del‐Bimbo, A. and Pala, P. (1999), “Semantics in visual information retrieval”, IEEE Multimedia, Vol. 6 No. 3, pp. 38‐53.
Cox, I.J., Miller, M.L., Omohundro, S.M. and Yianilos, P.N. (1996), “PicHunter: Bayesian relevance feedback for image retrieval”, Proceedings of the 13th International Conference on Pattern Recognition, Vienna, pp. 361‐9.
Cox, I.J., Miller, M.L., Minka, T.P., Papathomas, T.V. and Yianilos, P.N. (2000), “The Bayesian image retrieval system, PicHunter: theory, implementation, and psychophysical experiments”, EE Transactions on Image Processing, Vol. 9 No. 1, pp. 20‐7.
Eakins, J. and Graham, M. (1999), Content‐Based Image Retrieval: A Report to the JISC Technology Applications Programme, Institute for Image Data Research, University of Northuinbria, Newcastle, available at: www.unn.ac.uk/iidr/research/cbir/report.html (accessed April 22, 2001).
Enser, P.G.B. and McGregor, C.G. (1992), “Analysis of visual information retrieval queries”, British Library Research and Development Report, No. 6104, London.
Enser, P.G.B., Sandom, C.J., Hare, J.S. and Lewis, P.H. (2007), “Facing the reality of semantic image retrieval”, Journal of Documentation, Vol. 63 No. 4, pp. 465‐81.
Fidel, R. (1997), “The image retrieval task: implications for the design and evaluation of image databases”, New Review of Hypermedia and Multimedia, Vol. 3, pp. 181‐99.
Greenberg, J. (2001), “Automatic query expansion via lexical‐semantic relationships”, Journal of the American Society for Information Science and Technology, Vol. 52 No. 5, pp. 402‐15.
Greisdorf, H. and O'Connor, B. (2002a), “Modelling what users see when they look at images: a cognitive viewpoint”, The Journal of Documentation, Vol. 58 No. 1, pp. 6‐29.
Greisdorf, H. and O'Connor, B. (2002b), “What do users see? Exploring the cognitive nature of functional image retrieval”, Proceedings of the 65th Annual Meaning of the American Society for Information Science, Medford, NJ, Vol. 39, pp. 383‐90.
Hare, J.S., Lewis, P.H., Enser, P.G.B. and Sandom, C.J. (2006), “Mind the gap: another look at the problem of the semantic gap in image retrieval”, Proceedings of Multimedia Content Analysis, Management and Retrieval, SPIE, San Jose, CA, Vol. 6073, pp. 607309‐1‐607309‐12.
Hare, J.S., Lewis, P.H., Enser, P.G.B. and Sandom, C.J. (2007), “Semantic facets: an in‐depth analysis of a semantic image retrieval system”, Proceedings of the 6th ACM International Conference on Image and Video Retrieval, New York, NY, pp. 250‐7.
Hastings, S.K. (1995), “Query categories in a study of intellectual access to digitized art images”, Proceedings of the 58th Annual Meeting of the American Society for Information Science, Chicago, IL, Vol. 32, pp. 3‐8.
Hidderley, R., Brown, P., Menzies, M., Rankine, D., Rollason, S. and Wilding, M. (1995), “Capturing iconology: a study in retrieval modeling and image indexing”, Proceedings of the 2nd ELVIRA Conference, London, pp. 79‐91.
Hourihane, C. (1989), “A selective survey of systems of subject classification”, in Vaughan, W., Hamber, A. and Miles, J. (Eds), Computers and the History of Art, Mansell Publishing Limited, London, pp. 117‐29.
Jaimes, A. and Chang, S.F. (2000), “A conceptual framework for indexing visual information at multiple levels”, IS&T/SPIE Internet Imaging, 3964, available at: www.ctr.columbia.edu/∼ajaimes/Pubs/Spie00_internet.pdf (accessed December 20, 2002).
Jörgensen, C. (1995), “Classifying images: criteria for grouping as revealed in a sorting task”, Proceedings of the 6th ASIS SIG/CR Classification Research Workshop, Chicago, IL, Vol. 32, pp. 65‐78.
Jörgensen, C. (1998), “Attributes of images in describing tasks”, Information Processing & Management, Vol. 34 Nos 2/3, pp. 161‐74.
Jörgensen, C. (2003), Image Retrieval: Theory and Research, Scarecrow Press, Lanham, MD.
Kato, T. and Kurita, T. (1990), “Visual interaction with electronic art gallery”, Database and Expert Systems Applications: Proceedings of the International Conference, Wien, pp. 234‐40.
Kim, M. and Choi, K. (1999), “A comparison of collocation‐based similarity measures in query expansion”, Information Processing & Management, Vol. 35, pp. 19‐30.
Krause, M.G. (1988), “Intellectual problems of indexing picture collections”, Audiovisual Librarian, Vol. 14 No. 2, pp. 73‐81.
Kuroda, K. and Hagiwara, M. (2002), “An image retrieval system by impression words and specific object names – IRIS”, Neurocomputing, Vol. 43, pp. 259‐76.
Layne, S.S. (1994), “Some issues in the indexing of images”, Journal of the American Society for Information Science, Vol. 45 No. 8, pp. 583‐8.
Markey, K. (1984), “Interindexer consistency test: a literature review and report of a test of consistency in indexing visual materials”, Library & Information Science Research, Vol. 6 No. 2, pp. 155‐77.
Markey, K. (1988), “Access to iconographical research collections”, Library Trends, Vol. 37 No. 2, pp. 154‐74.
Markkula, M. and Sormunen, E. (1998), “Searching for photos‐journals practices in pictorial IR”, in Eakins, J.P., Harper, D.J. and Jose, J.M. (Eds), The Challenge of Image Retrieval, available at: www.ewic.org.uk/ewic/workshop/view.cfm/CIR‐98 (accessed October 10, 2002).
O'Connor, B. (1993), “Browsing: a framework for seeking functional information”, Science Communication, Vol. 15 No. 2, pp. 211‐32.
O'Connor, B. and Wyatt, R. (2004), Photo Provocations: Thinking in, with, and about Photographs, The Scarecrow Press, Lanham, MD.
O'Connor, B., O'Connor, M. and Abbas, J. (1999), “User reactions as access mechanism: an exploration based on captions for images”, Journal of the American Society for Information Science, Vol. 50 No. 8, pp. 681‐97.
Panofsky, E. (1962), Studies in Iconology: Humanistic Themes in the Art of the Renaissance, Harper & Row, New York, NY.
Plaunt, C. and Norgard, B.A. (1998), “An association‐based method for automatic indexing with a controlled vocabulary”, Journal of American Society for Information Science, Vol. 49 No. 10, pp. 888‐902.
Porter, M.F. (1980), “An algorithm for suffix stripping”, Program, Vol. 14, pp. 130‐7.
Rasmussen, E. (1992), “Clustering algorithms”, in Frakes, W.B. and Baeza‐Yates, R. (Eds), Information Retrieval: Data Structures and Algorithms, Prentice‐Hall, Englewood Cliffs, NJ.
Rui, Y., Huang, T.S. and Chang, S.‐F. (1999), “Image retrieval: current techniques, promising directions, and open issues”, Journal of Visual Communication and Image Representation, Vol. 10 No. 4, pp. 39‐62.
Salton, G. and McGill, M.J. (1983), Introduction to Modern Information Retrieval, McGraw‐Hill, New York, NY.
Saracevic, T. and Kantor, P. (1988), “A study of information seeking and retrieving: II. Users, questions, and effectiveness”, Journal of the American Society for Information Science, Vol. 39 No. 3, pp. 177‐96.
Shannon, C.E. and Weaver, W. (1949), The Mathematical Theory of Information, University of Illinois Press, Urbana, IL.
Tague‐Sutcliffe, J. (1992), “The pragmatics of information retrieval, revisited”, Information Process & Management, Vol. 34 No. 4, pp. 467‐90.
Tanaka, S., Inoue, M., Ishiwaka, M. and Inoue, S. (1997), “A method for extracting and analyzing kansei factors from pictures”, IEEE Workshop on Multimedia Signal Processing, New York, NY, pp. 251‐6.
Yoon, J. (2006), “An exploration of needs for connotative messages during image search process”, Proceedings of the 69th Annual Meeting of the American Society for Information Science and Technology, Austin, TX, Vol. 43.
Zhou, X. and Huang, T. (2003), “Relevance feedback for image retrieval: a comprehensive review”, Multimedia Systems, Vol. 8 No. 6, pp. 536‐44.
Further Reading
Cooper, W. (1980), Course Lecture Notes, School of Library & Information Studies, University of California, Berkeley, CA.
Holmes, O.W. (1859), “The Stereoscope and the stereograph”, The Atlantic Monthly, No. 3, pp. 738‐48, June.