Estimating the Semantic Density of Visual Media
Proceedings of the 32nd ACM International Conference on Multimedia, 2024•dl.acm.org
Image descriptions provide precious information for a myriad of visual media management
tasks ranging from image classification to image search. The value of such curated
collections comes from their diverse content and their accompanying extensive annotations.
Such annotations are typically supplied by communities, where users (often volunteers)
curate labels and/or descriptions of images. Supporting users in their quest to increase
(overall) description completeness where possible is, therefore, of utmost importance. In this …
tasks ranging from image classification to image search. The value of such curated
collections comes from their diverse content and their accompanying extensive annotations.
Such annotations are typically supplied by communities, where users (often volunteers)
curate labels and/or descriptions of images. Supporting users in their quest to increase
(overall) description completeness where possible is, therefore, of utmost importance. In this …
Image descriptions provide precious information for a myriad of visual media management tasks ranging from image classification to image search. The value of such curated collections comes from their diverse content and their accompanying extensive annotations. Such annotations are typically supplied by communities, where users (often volunteers) curate labels and/or descriptions of images. Supporting users in their quest to increase (overall) description completeness where possible is, therefore, of utmost importance.
In this paper, we introduce the notion of visual semantic density, which we define as the amount of information necessary to describe an image comprehensively such that the image content can be accurately inferred from the description. Together with the already existing annotations, this measure can estimate the annotation completeness, helping to identify collection content with missing annotations.
We conduct user experiments to understand how humans perceive visual semantic density in different image collections to identify suitable proxy measures for our notion of visual semantic density. We find that extensive image captions can serve as a proxy to calculate an image's semantic density. Furthermore, we implement a visual semantic density estimator capable of approximating the human perception of the measure. We evaluate the performance of this estimator on several image datasets, concluding that it is feasible to sort images automatically by their visual semantic density, thereby allowing for the efficient scheduling of annotation tasks. Consequently, we believe that the visual semantic density estimation process can be used as a completeness measure to give feedback to annotating users in diverse visual content ecosystems, such as Wikimedia Commons.

Showing the best result for this search. See all results