A Framework For Content Based Semantic Information Extraction From Multimedia Contents
A Framework For Content Based Semantic Information Extraction From Multimedia Contents
A Framework For Content Based Semantic Information Extraction From Multimedia Contents
M ULTIMEDIA C ONTENTS
FROM
El doctorando
El director
El director
Revi si on : 80M
DRAFT 1.7
The following web-page address contains up to date information about this dissertation and related topics:
http://www.vicomtech.org/
Text printed in Donostia San Sebastin
First edition, September 2013
Zuretzat aita.
Abstract
One of the main characteristics of the new digital era is the media
big bang, where images (still images or moving pictures) are one of
the main type of data. Moreover, this is an increasing trend mainly
pushed by the easy of capturing given by all new mobile devices that
include one or more cameras.
From a professional perspective, most content related sectors are
facing two main problems in order to operate efficient content management systems: a) need of new technologies to store, process and
retrieve huge and continuously increasing datasets and b) lack of
effective methods for automatic analysis and characterization of
unannotated media.
More specifically, the audiovisual and broadcasting sector which is
experiencing a radical transformation towards a fully Internet convergent ecosystem, requires content based search and retrieval systems
to browse in huge distributed datasets and include content from
different and heterogeneous sources.
On the other hand, earth observation technologies are improving the
quantity and quality of the sensors installed in new satellites. This
fact implies a much higher input data flow that must be stored and
processed.
In general terms, the aforementioned sectors and many other media related activities are good examples of the Big Data phenomenon
where one of the main problem relies on the semantic gap; the inability to transform mathematical descriptors obtained by image
processing algorithms into concepts that humans can naturally understand.
This dissertation work presents an applied research activity overview
along different R&D projects related with computer vision and multimedia content management. One of the main outcomes of this
Resumen
Laburpena
Acknowledgements
This is not the story of a self-made man. Instead, all the achievements
presented in this work have a long chain behind, a chain composed
by people that have supported my entire professional career and
something that cannot be separated from personal experiences. At
this point, it is worth to acknowledge all these people.
In this sense, my both supervisors, Basilio Sierra and Julin Flrez
have been an essential part of this work, with an unconditional
commitment and a highly valuable scientific guidance. Dudarik
gabe, esan liteke, Julian, nire bide profesionalaren lehen hastapenak
zurekin eman nituela. Hasieratik zugandik sentitu nuen konfidantza
eta babesa ez dira hamarkada oso batetan gutxiagora joan, eta hori
bada zerbait. Denbora guzti honetan zugandik ikasia nire eguneroko
lanaren oinarri nagusienetako bat izanik, lan honetan ere halaxe isladatzen da. Bestalde, unibertsitatean irakasle egoki bat aukeratzeko
bidean, zorte izugarria izan nuen Basi ezagututa. Hasieratik jakin
izan du nire egoera profesionalak sortzen dizkidan etenaldi eta jarraipen faltara egokitzen. Aholku eta zuzendaritza zientifiko ezin hobe
bat egin dituela esango nuke eta era atsegin eta gogotsuan gainera,
gogor eta astuna izan litekeen prozesu bat, gustora egiten den lana
bihurtuz.
Dentro de Vicomtech, entorno en el que se ha movido la mayor parte
de mi actividad profesional y donde se enmarca esta tsis, he contado
con innumerables apoyos. Seguro que dejar alguno sin mencionar
(desde aqu mis disculpas) pero no por ello quiero dejar de citar algunos tales como Jorge Posada, Director Adjunto, que me ha apoyado
en todo momento con nimos y consejos prcticos que vienen muy
bien cuando uno se centra demasiado en su problema. Amalia y
David, compaeros de fatigas que me demostraron que s es posible
hacer una tesis doctoral compaginada con la actividad profesional
Contents
List of Figures
xvii
List of Tables
xxi
I Work Description
1 Introduction
1.1.1 VicomtechIK4 . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1.1
1.2.1 Begira . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.1.1
Summary . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.1.2
Conclusions . . . . . . . . . . . . . . . . . . . . . . . .
1.2.2 Skeye . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.2.1
Summary . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.2.2
Conclusions . . . . . . . . . . . . . . . . . . . . . . . .
1.2.3 SiRA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.3.1
Summary . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.3.2
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.4 SIAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.4.1
Summary . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.4.2
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.5 Cantata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.5.1
Summary . . . . . . . . . . . . . . . . . . . . . . . . . 12
xiii
CONTENTS
1.2.5.2
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.6 RUSHES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.6.1
Summary . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.6.2
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.7 Grafema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2.7.1
Summary . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2.7.2
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2.8 IQCBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2.8.1
Summary . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2.8.2
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 17
21
27
Meteorology . . . . . . . . . . . . . . . . . . . . . . . . 34
37
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 General description of the DITEC method . . . . . . . . . . . . . . . 38
4.2.1 Sensor modeling . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.2 Data transformation . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2.2.1
Functionals . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2.2.2
Geometrical constraints . . . . . . . . . . . . . . . . . 42
4.2.2.3
Quantization effects . . . . . . . . . . . . . . . . . . . 44
Statistical descriptors . . . . . . . . . . . . . . . . . . 48
4.2.3.2
Cauchy Distribution . . . . . . . . . . . . . . . . . . . 51
4.2.4 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.4.1
4.2.4.2
xiv
CONTENTS
4.3 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3.1 Case study 1: Corel 1000 dataset . . . . . . . . . . . . . . . . . 54
4.3.2 Case study 2: Geoeye satellite imagery . . . . . . . . . . . . . 57
4.4 Computational complexity . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4.1 Computational complexity of the trace transform . . . . . . . 61
4.4.2 Computational complexity of attribute selection and classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.4.3 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.5 Conclusion of the presented method . . . . . . . . . . . . . . . . . . 63
4.6 Modified DITEC as local descriptor . . . . . . . . . . . . . . . . . . . 64
4.7 Implementation of DITEC as local descriptor . . . . . . . . . . . . . 64
4.7.1 Trace Transformation . . . . . . . . . . . . . . . . . . . . . . . 66
4.7.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.7.3 DITEC parameters . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.7.3.1
4.7.3.2
Geometric Transformations . . . . . . . . . . . . . . 71
4.7.4.2
Photometric Transformations . . . . . . . . . . . . . 72
75
77
81
7 Publications
83
7.1 Weather analysis system based on sky images taken from the earth
83
xv
CONTENTS
7.3 Acc. Obj. Tracking and 3D Visualization for Sports Events TV Broadcast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.4 DITEC: Experimental analysis of an image characterization method
based on the trace transform . . . . . . . . . . . . . . . . . . . . . . . 84
7.5 Image Analysis platform for data management in the meteorological domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.6 Architecture for semi-automatic multimedia analysis by hypothesis
reinformcement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.7 Trace transform based method for color image domain identification 85
7.8 On the Image Content of the ESA EUSC JRC Workshop on Image
Information Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.9 Authors other publications . . . . . . . . . . . . . . . . . . . . . . . . 86
8 Selected Patents
93
8.1 Method for detecting the point of impact of a ball in sports events . 93
8.2 Authors Other Related Patents . . . . . . . . . . . . . . . . . . . . . . 93
95
97
103
Bibliography
107
xvi
List of Figures
1.1 Begira scene definition. . . . . . . . . . . . . . . . . . . . . . . . . . .
44
xvii
LIST OF FIGURES
4.5 Trace Transform and subsequent Discrete Cosine Transform of
Lenna. (Y channel of YCbCr color space) . . . . . . . . . . . . . . . . 48
4.6 Conceptual scheme: DCT matrix transformation into , k pair vector. 49
4.7 Statistical properties of all Kurtosis measurements made on the
distributions obtained by processing Corel 1000 dataset . . . . . . . 50
4.8 Examples of probability density distribution and histograms obtained by the samples . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.9 Samples of Corel 1000 dataset. The dataset includes 256x384 or
384x256 images.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.10 Distance among classes in the Corel 1000 dataset according to misclassified instances. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.11 Distance among most inter-related classes in the Corel 1000 dataset
according to misclassified instances. . . . . . . . . . . . . . . . . . . . 57
4.12 Corel 1000 picture corresponding to class Architecture and classified
as Mountain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.13 Corel 1000 precision results with different feature extraction algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.14 Samples of satellite footage dataset. 256x256px patches at different
scales.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.15 Distance among classes in the Geoeye dataset according to misclassified instances. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.16 Time performance behavior. . . . . . . . . . . . . . . . . . . . . . . . 62
4.17 System workflow for DITEC as local feature . . . . . . . . . . . . . . . 65
4.18 Matching accuracy depending on the number of angular samples . 68
4.19 Matching accuracy depending on the number of radial samples . . 69
4.20 Matching accuracy depending on the number of simultaneous increase of angular and radial sampling . . . . . . . . . . . . . . . . . . 69
4.21 Computation time depending on the simultaneous increase of angular and radial sampling . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.22 In-plane Rotation Transformation matching results. . . . . . . . . . 71
4.23 Scale Transformation matching results. . . . . . . . . . . . . . . . . . 71
4.24 Projective Transformation matching results. . . . . . . . . . . . . . . 72
4.25 Exposure change photometric Transformation matching results. . . 73
4.26 Trace transform row and column analysis . . . . . . . . . . . . . . . . 73
A.1 DITEC development platform . . . . . . . . . . . . . . . . . . . . . . . 98
A.2 Circular patch image . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
xviii
LIST OF FIGURES
A.3 Result of (, ) space exploration with Bresenham . . . . . . . . . . . 99
A.4 First half of the source image is sampled (blue regions) while areas
around vertical and horizontal axes are not considered. . . . . . . . 100
A.5 Second half of the source image is sampled (red and green). These
3 5 7
4 , 4 , 4 ,
. . . . . . . . 102
xix
List of Tables
4.1 List of Trace Transform functionals proposed in [KP01] . . . . . . . . 42
4.2 Quantization effects of the trace transform . . . . . . . . . . . . . . . 46
4.3 Corel 1000 dataset confusion matrix. . . . . . . . . . . . . . . . . . . 56
4.4 Geoeye dataset confusion matrix. . . . . . . . . . . . . . . . . . . . . 59
xxi
Part I
Work Description
CHAPTER
Introduction
Artificial Intelligence (AI) is probably one of the most exciting knowledge fields
where even the definition of the term becomes controversial due to the manifold
understanding of the intelligence that remains as a hard epistemological problem. Learning, reasoning, understanding, abstract thought, planning, problem
solving and other related topics are all different aspects that imply intelligence.
The emergence of programmable digital computers in the late 1940 offered a
revolutionary way to experimentally explore new methods for formal reasoning
and logic. However, the initial great expectation of AI did not come into reality
and the prediction made by Herbert A. Simon1 machines will be capable, within
twenty years, of doing any work a man can do still remains as a Science Fiction
topic.
The fashions of AI over the years have moved from automated theorem proving to expert systems that later on where substituted by behaviour-based robotics
and now seem to find the solution on learning from big data[Lev13]. All these
trends have not been able to meet the expectation that the founders of AI put
on the field[Wan11]. Patrick Winston( director of the MIT Artificial Intelligence
Laboratory from 1972 to 1997) cited the problem of mechanistic balkanization,
with research focusing on ever-narrower specialties such as neural networks or
genetic algorithms. When you dedicate your conferences to mechanisms, theres a
1
Herbert Alexander Simon (June 15, 1916 February 9, 2001), ACMs Turing Award for making
basic contributions to artificial intelligence, the psychology of human cognition, and list processing
in (1975) and considered one of the founders of AI
1. INTRODUCTION
tendency to not work on fundamental problems, but rather [just] those problems
that the mechanisms can deal with[Cas11].
However, there has been a great scientific and technological advance in many
AI related domains formal logic, reasoning, statistics and data mining, genetic
programming, knowledge representation, etc. that without satisfying the foundations proposed by Winston or Chomsky, has enabled the creation of technological
solutions for different application fields such as natural language processing,
computer vision, drug design, medical diagnosis, genetics, finance & economy,
user recommendation systems and many others.
1.1.1 VicomtechIK4
VicomtechIK4 as an applied research centre is focused on all aspects related with
multimedia and visual communication technologies along the entire content production pipeline, from generation, through processing and transmission until
1
2
http://www.vicomtech.org
http://www.ehu.es
1. INTRODUCTION
1.2.1 Begira
Title: Diseo y Desarrollo de un Sistema Seguimiento Preciso de Objetos
en Transmisiones Deportivas (Design and development of a high accuracy
object tracking system for sports broadcasting).
Project typology: Industrial project partially supported by the Gaitek programme.
Company name: G93.
1.2.1.1 Summary
Augmented reality projects require a deep knowledge of the scene that has to be
extracted/updated in real time. In order to ensure the accuracy and real-time
performance of the system, the knowledge must be explicitly defined.
The goal of of Begira project was to develop a single-camera system to track
the ball trajectory and position the bouncing point for Basque Pelota live TV
transmissions. The main constraints of the system were:
Single camera.
Broadcasting camera (720p@50).
Tracking, positioning and virtual reconstruction under 20 seconds.
Single standard computer for processing purposes.
From an Artificial Intelligence perspective, we can consider it as a system
where the knowledge domain is reduced to a single scene (the Basque Pelota
court) and thus can be explicitly defined. The main elements that define this
domain are:
3D environment: A court composed by 3 plane surfaces (front wall, side wall
and ground).
The relative position of the camera to the court is obtained during a
calibration process by putting a checkerboard on the ground.
Once the camera is calibrated, its position is fixed during the entire
match.
Dynamic objects: There are only 2 types of dynamic objects in the scene:
Players: There can be two or 4 players. Their size is much bigger than the
ball and most of the time their lowest part is touching the ground.
Ball: It is white, round and much smaller than the players. Sudden trajectory changes are due to the hit of the players or a bounce. The ball is
so rigid that the bounce can be considered elastic.
According to the domain defined with the aforementioned concepts, a homography matrix H is calculated to obtain cameras extrinsic parameters. Then the
1. INTRODUCTION
(Camera Origin)
(Camera Origin)
((xi,, yi))
(xi, yi)
R
(xi, yi)
(a)
(b)
Figure 1.1: Scene definition: Ball trajectory samples used to estimate the parametric curves and the calculation of the bouncing point on the ground once the center
position of the ball is obtained (crossing point of the two curves).
ball is initially detected and the tracking system follows its trajectory. Abrupt trajectory changes define the limit between the instant before and after the bounce.
Once the two parametric curves are estimated, their crossing point is calculated
on the image. This two-dimensional position (in pixels) is then converted to the
3D space using the inverse of the homography matrix (H 1 ). To solve the uncertainty of the 3D position obtained by the 2D projection, the condition Z = 0 is
established for the bouncing point. More details of the project can be found in
Section 7.3.
1.2.1.2 Conclusions
The Begira project is a good example of expert systems applied to image processing and computer vision. The technical goals were successfully achieved and the
results of the project were exploited by the Basque public broadcaster ETB and
the TV content producer G93. However, the knowledge acquired by the system
was so hardcoded that it is very difficult to extend or integrate it in other more
general solutions. The good performance and accuracy results rely on its reduced
domain definition and rigid nature.
1.2.2 Skeye
Title: Sistema de anlisis meteorolgico basado en imgenes del cielo
tomadas desde tierra (Meteorological analysis system based on images
taken from the earth).
Project typology: Industrial project supported by the Gaitek programme.
(a)
(b)
1.2.2.2 Conclusions
Similarly to Begira, in this case, the feature extraction process provided all the
information need for a further class assignment by applying specific thresholds.
However, the further integration of the developed system in more domains or
scenes would be a difficult task since all the development and the selected features
totally depend on the domain definition and scene conditions. More information
about this work can be found in Section 7.1.
1.2.3 SiRA
Title: Diseo y Desarrollo de un Sistema de Reconocimiento de Marcas
Comerciales en Emisiones Televisivas (Design and development of a system
for commercial brand recognition in TV broadcasts).
1. INTRODUCTION
Project typology: Industrial project supported by the Gaitek programme.
Company name: Vilau.
Period: 2007-2008.
1.2.3.1 Summary
This project is another example of a system based on a reduced semantic domain,
but in this case the approach was more general and some higher abstraction level
elements were introduced. The goal of SiRA was to detect logos in TV content in
order to automatize advertisement monitoring tasks. This project was also supported by the Basque Government and its industrial application was envisioned
by Vilau, a media communication company.
In this case, the constraints in terms of real time behavior and equipment
were lighter than in Begira. However, the domain was broader: any type of logos
embedded in any type of content taken from different perspectives.
The approach followed in this case was to firstly detect a logo candidate assuming that a logo would be typically surrounded by a regular shape (square,
circle, triangle, etc.) and composed of very few colors. Once the logo was detected,
different feature extraction algorithms could be applied in order to compare the
results with those features corresponding to the target logo dataset. Depending
on the extracted features, different distance metrics were applied.
1.2.3.2 Conclusions
The results of SiRA can be integrated as a new feature in other content analysis
systems. In this case, SiRA would provide information about potential logos existing in a specific video or still image. Moreover, even if the process itself is carried
out by using low-level operators, it can be considered that the result of SiRA is a a
set of high level features with valuable semantic content as in general terms the
presence of a logo means that there is a product or and advertisement related to
it.
1.2.4 SIAM
Title: Diseo y Desarrollo de un Sistema de Anlisis Multimedia de Contenido Audiovisual en Plataformas Web Colaborativas (Design and development of a system for multimedia analysis of audiovisual content in
collaborative web platforms).
10
1.2.4.1 Summary
First ideas of this work related with a semantic analysis of multimedia content
were developed in SIAM. The goal of this project was to create content analysis tools to improve the exploitation of large amounts of user generated content.
The context of the project was www.tu.tv, a YouTube like video sharing platform
owned by Hispavista. According to this approach, the semantic labels can be obtained from unstructured user comments. Then, by finding similar contents, new
non tagged content can be assigned to a previous label.
As the type of content analyzed in SIAM were any kind of videos, the semantic
domain was too broad and complex to be defined where one of the main problems was the definition of a semantic unit in a video. The assumption of a video
as a semantic unit is to inconsistent in many cases as the elements on it can be
changing along the time. Therefore, each video was decomposed in shots and
each shot was analyzed and labeled. Finally, the entire video would be labeled as
the composition of each shot label.
1.2.4.2 Conclusions
The main outcome of SIAM was the shot based content analysis model and a shot
boundary detector that has been later used for semantic analysis purposes. Moreover, the potential of user generated metadata was addressed in this project. We
identified the potential of this amount of unstructured data that could be complementary to the perfectly organized but expensive to populate professional
taxonomies.
1.2.5 Cantata
Title: Content Aware Networked systems Towards Advanced and Tailored
Assistance
Project typology: ITEA
Period: 2007-2009
11
1. INTRODUCTION
Consortium: Bosch Security Systems,Philips Electronics Netherlands,Philips
Medical Systems,Philips Consumer Electronics,TU/e, TU Delft, Multitel,
ACIC, Barco, Traficon, VTT, Solid, Hantro, Capacity Networks, I&IMS, Telefonica VicomtechIK4, University Pompeu Fabra, CRP, Henri Tudor, Codasystem, Kingston University, University of York, INRIA.
1.2.5.1 Summary
The goal of Cantata was to create a distributed service for content analysis. The application field included medical imaging, entertainment an security. Our activity
was focused in the entertainment sector where the content analysis modules were
connected to user profiles in order to create content recommendation systems.
In this case, the logo detection system was used to provide content information to the main content analysis and recommendation system.
1.2.5.2 Conclusions
The recommendation system was intended to combine user activity information,
content metadata and low-level feature based information. However, the broad
domain definition required an unfordable amount of low-level descriptors and
even the combination of all these descriptors would be a very complex issue. Due
to this complexity, most recommendation systems rely basically on metadata.
1.2.6 RUSHES
Title: Retrieval of mUltimedia Semantic units for enHanced rEuSability.
Project typology: FP6-2005-IST-6.
Period: 2007-2009
Consortium: Heinrich-Hertz-Institut (DE), University of Surrey (UK),
Athens Technology Centre (GR), Vcomtech (ES), Queen Mary University of
London (UK), Telefonica I+D (ES), FAST Search & Transfer (NO), University
of Brescia (IT), ETB (ES).
1.2.6.1 Summary
The overall aim of RUSHES was to design, implement, validate, and trial a system
for both delivery of, access to raw media material (rushes) and the reuse of that
12
1.2.6.2 Conclusions
The RUSHES consortium tried to address the semantic gap by creating a powerful
architecture composed of low-level operators. The workflow designed in Rushes
1.3 was able to combine multiple low-level features and multiple types of sources
13
1. INTRODUCTION
(video, audio, text). Moreover, the shot was considered as a semantic unit of a
video. Due to the fact that different shot boundary operators provide different
shots, an extra complexity was added to the metadata model where each feature
could define its temporal boundaries.
All the low-level operators were applied to every content in the database. This
fact introduced a strong limitation in the scalability of the domain. In order to
identify new concepts, more low-level operators might be needed and as the size
of the feature-space dimensionality increased, the system became both computationally too demanding and unfordable for the data mining and ontology
management processes. We presented a potential solution to this problem in
[OMK+ 09] by splitting the domain in sub-domains that only apply those lowlevel feature extraction operators suggested by the domain definition (ontology).
However it requires a prior knowledge of the content that should be obtained by
applying low-level operators. This chicken-egg problem will be one of the key
topics of this research work.
1.2.7 Grafema
Title: Grafema: Multimodal content search platform
Project typology: Basic research project.
Period: 2012.
1.2.7.1 Summary
The goal of the Grafema project was to create a base platform to store, annotate
and retrieve multimedia content of diverse nature. More than focusing on the
algorithms to obtain content descriptors or methods for automatic content annotation, Grafema was focused on the architectural aspects and the design of a
generic solution to deal with different types of content. In this sense, an asset
could be either text, image, audio, video, 3D or even a combination of these previous elementary units. According to this generic description of a digital asset,
similarity metrics must also adapt to each case or combination. As it can be observed in Figure 1.4, assets containing the label tiger can be considered as similar
if they include this information in the metadata or if this label is found in any of
the elementary units that compose the content.
The workflow designed for Grafema (Figure 1.5) is based in low-level operators that are independently processed. The information obtained from these
14
15
1. INTRODUCTION
1.2.7.2 Conclusions
The results of Grafema have shown the big potential of iterative processes for multimedia searching. Even if the tests have been carried out with limited datasets in
terms of size and domain complexity, the results show that text based search can
be dramatically improved datasets include high volumes of multimedia content.
Regarding the state of the art, the annotation and individual metrics as well
as the unsuitability of most common database solutions for multimedia data are
still the main drawbacks that limit the potential of these kind of systems.
1.2.8 IQCBM
Title: Image Query by Compression Based Methods
Project typology: Industrial project.
Period: 2011.
Consortium: DLR (German Aerospacial Agency).
1.2.8.1 Summary
The goal of this project was to create low-level operators and define distance metrics for satellite imagery that would be applied during the ingestion process of the
16
an adaptable/extensible
experimentation framework
The domain of Remote Sensing is not as broad as those related with the audio-
visual sector, but are still too big and complex to be explicitly defined. Moreover,
new definitions and relationships could be dynamically introduced.
Q
64x64
RGB
to
HSV
TIFF LZW
Guesebroek
PAMI
2001
Your
preprocessor
here
Single
query
measures
UI
user classes
FCD
Ranked
docs
nanocodebooks
Statistical
measures
PRDC
JPEG/MPEG DCT
Your post-processor here
JPEG 2K WLTs
Preprocess.
framew.
Ranked
docs
Random
Codebook
analysis
MPEG-7 VST
Input
image
patch
complexity
Compute
Dict
Your
distance
measure
here
MonetDB
Django
adapter
Ranked
docs
Your
performance
measures
here
MonetDB
storage + execution
Analysis framework
Indexing framework
Query/ranking framework
Comparison
framework
In order to address the lack of prior knowledge, global features were considered more adequate than the local ones. The first algorithm implemented in this
project was based on the codewords provided by a Lempel-ziv compressor as suggested by Watanabe et al. [WSS02]. The L0 distance (Equation 1.1) was used as
a metric for the codewords related to each element (in this case an element is
represented by each of the patches obtained after a tiling process applied to the
multi-resolution satellite imagery).
d L0 =
n
X
|x i y i |0
where:
00 = 0
(1.1)
i =1
1.2.8.2 Conclusions
The developed system (Figure 1.8) was tested against Corel 1000 dataset [Cor]
and a subset of the Geoeye imagery [Glo] obtained good accuracy characteristics.
The length of the feature vector for each item was variable and was an attribute by
17
1. INTRODUCTION
itself as it provides a measure of the complexity of the image. However, in terms
of scalability, the average length (several thousands of codewords) obtained by
this algorithm might become a limitation.
As a result of this project, a deep study of the current trends of the community
was carried out [QGO13]. Moreover, a new global feature extraction algorithm
was developed based on the ideas of Kadyrov et al. [KP98, KP01, KP06].
18
Figure 1.9: Relationship between R&D projects and scientific activity in multimedia
content analysis.
19
CHAPTER
Confucius
21
SEARCH LINE
STORAGE LINE
USER
QUERIES
RF
PROCESSING
DOCUMENTS
QUERY
NORMALIZATION
RULES
of
the
Game
NORMALIZED
QUERY
COMPARISON/
MATCHING
RANKING
RELEVANT
DOCUMENT
DOCUMENT
NORMALIZATION
INDEX
CREATION
INDEXES
DOCUMENT
STORAGE
BROWSING LINE
Worklow of Data
Service
relationships among them, defining a domain in this way. One common use of
ontologies is to establish shared vocabularies and taxonomies between scientist
or professionals. However, from a cognitive system perspective, the most powerful characteristic of ontologies is the capability of inference that creates new
rules that where not explicitly defined. The main drawback of ontologies comes
from the fact that broad complex domains such as those related to the common
vision understanding cannot be specifically defined, mainly because the size, the
complexity and the fuzziness of this kind of domains.
Content Based Image Retrieval (CBIR) systems can be considered as one of the
branches of cognitive vision since they require the four functionalities considered
as the pillars of a cognitive vision system: detection, localization, recognition and
understanding[Ver06]. Marcos et al propose a reference model that addresses the
use of ontologies for multimedia retrieval purposes [MIOF11]. This work presents
a reference model (Figure 2.1) based on a semantic middleware. The main goal
of this approach is to create a layer to deal with semantic functionalities (e.g.:
knowledge extraction, semantic query expansion,. . . ).
Marcos proposes in his PhD work[Mar11] the use of the semantic middleware
22
23
24
Wisdom
Knowledge
Information
Data
Figure 2.3: DIKW Pyramid
25
CHAPTER
Domain Identification
As it has been stated in the previous section, the domain identification is one
of the key issues of cognitive vision as it allows the use of contextual information.
Current best performing systems are mainly those where the size and the complexity of the domain are relatively low. Deng et al. [DBLFF10] perform a study
the effects of dealing with more than 10,000 categories. The results show that:
Computational issues become crucial in algorithm design.
Conventional wisdom from a couple of hundred image categories on relative performance of different classifiers does not necessarily hold when the
number of categories increases.
There is a surprisingly strong relationship between the structure of the
WordNet and the difficulty of visual categorization.
Classification can be improved by exploiting the semantic hierarchy.
The process carried out by Deng et al. is based on state of the art descriptors
such as GIST[OT01] and SIFT[Low99]. The classification process uses Support
Vector Machines and the dataset includes more than 9 million assets.
Popular AI development results such as Deep Blue against Kasparov [Dee]
commonly considered as a great step in AI where machines are able to beat human minds are clear cases where the domain and the rules that define it are rather
simple, while combinatorial space derived from it becomes huge. For those cases,
27
3. DOMAIN IDENTIFICATION
Answer
sources
Question
Question
analysis
Primary
search
Query
decomposition
Evidence
sources
Supporting
evidence
retrieval
Candidate
answer
generation
Deep
evidence
scoring
Hypothesis
generation
Soft
filtering
Hypothesis and
evidence scoring
Hypothesis
generation
Soft
filtering
Hypothesis and
evidence scoring
Synthesis
Final Merging
and Ranking
Trained
Models
Answer
and
confidence
brute force algorithms can defeat human experience and heuristics capabilities.
In the case of Deep Blue its domain dependence was so high that even some
hardware components where specifically designed for chess playing purposes.
A step forward was done by Watson [Wat] in 2011 that won the Jeopardy! prize
against former winners. In these cases, Watson was able to process natural language by identifying keywords and accessing 200 million pages of structured and
unstructured content. As it is stated in the IBM DeepQA Research Team (developers of Watson) when they refer to Watson This is no easy task for a computer,
given the need to perform over an enormously broad domain, with consistently
high precision and amazingly accurate confidence estimations in the correctness of
its answers. However, even if the constraints to perform this task are much harder
than for chess playing, apart from the natural language processing module, the
task of playing the Jeopardy! can be considered as an advanced text search engine that does not require prior contextual knowledge as it can be observed in its
architectural design (Figure 3.1).
The current state of the art is plenty of AI approaches that face the same limitation observed in these two examples. They obtain a very good performance in a
specific narrow domain but fail when it scales up or when the same system is applied for a different problem. Current multimedia information retrieval systems
are exactly in this situation where contents belonging to specific contexts can be
successfully managed but have strong limitations of flexibility and scalability.
28
3.1.1 Broadcasting
The broadcasting sector has experienced a deep transformation with the introduction of digital technologies. All internal work-flows have been affected by the
fact of representing content digitally. Regarding the Multimedia Asset Management (MAM) systems, before the content was digital, all assets were centralized
and managed by documentalists/librarians, professionals that following a rigid
taxonomy were responsible of annotating, storing and retrieving the content.
Therefore, the work-flow was organized in a manner that documentalists offer
the content management service to editors. Since the digitalization of ingesting
and delivery processes, editors can directly and concurrently access to the content they are looking for. It offers great advantages in terms of efficiency allowing
non-linear editing and minimizing access times. However, this new work style introduces much more inconsistencies since contents are concurrently annotated
by users that do not strictly follow a given taxonomy. in the metadata and in order
to create direct search and retrieval services, content annotations must be richer
and better since editors do not have the knowledge of documentalists to browse
among millions of assets. In order to get this improved metadata, manual annotations result too expensive for most cases and automatic annotation systems are
not able to characterize high abstraction level categories, specially due to the size
and complexity of the broadcasting context.
29
3. DOMAIN IDENTIFICATION
From a technical point of view, there are many industrial solutions and standards for metadata (SMEF, BMF, Dublin Core, TV Anytime, MPEG-7, SMPTE
Descriptive Metadata, PB Core, MXF-DMS1, XMP etc.) that offer good retrieval
characteristics. However, all these technologies and specifications rely on a previously annotated dataset that in most practical cases cannot be populated at an
affordable cost.
The data volume for the EOC DIMS Archive in Oberpfaffenhofen is projected to about 2
petabytes in 2013 (Christoph Reck, DLR-DFD, presentation during ESA EOLib User Requirements
workshop, ESRIN November 17, 2011)
30
Figure 3.2: Idealized query process decomposition into processing modules and
basic operations based on an adaptation of Smeulders et al.[SWS+ 00].
http://landsat.gsfc.nasa.gov/
http://modis.gsfc.nasa.gov/
31
3. DOMAIN IDENTIFICATION
A special particularity of the EO domain is the diversity of type of data provided by the instruments installed in a satellite, where most of them are affected
by noise and distortions produced by the distance, atmosphere, etc.
Envisat (Environmental Satellite) launched on 2002 and operated by ESA
(European Space Agency) includes the following instruments1 (Figure 3.3):
ASAR: Advanced Synthetic Aperture Radar, operating at C-band, ASAR ensures
continuity with the image mode (SAR) and the wave mode of the ERS-1/2
AMI.
MERIS a programmable, medium-spectral resolution, imaging spectrometer operating in the solar reflective spectral range. Fifteen spectral bands can be
selected by ground command, each of which has a programmable width
and a programmable location in the 390 nm to 1040 nm spectral range.
AATSR: Advanced Along Track Scanning Radiometer, continuity of the ATSR-1
and ATSR-2 data sets of precise sea surface temperature (SST) levels of
accuracy (0.3 K or better).
RA-2 Radar Altimeter 2 (RA-2) is an instrument for determining the two-way
delay of the radar echo from the Earths surface to a very high precision:
less than a nanosecond. It also measures the power and the shape of the
reflected radar pulses.
MWR: microwave radiometer (MWR) for the measurement of the integrated
atmospheric water vapour column and cloud liquid water content, as correction terms for the radar altimeter signal. In addition, MWR measurement
data are useful for the determination of surface emissivity and soil moisture
over land, for surface energy budget investigations to support atmospheric
studies, and for ice characterization.
GOMOS: measures atmospheric constituents by spectral analysis of the spectral bands between 250 nm to 675 nm, 756 nm to 773 nm, and 926 nm to
952 nm. Additionally, two photometers operate in two spectral channels;
between 470 nm to 520 nm and 650 nm to 700 nm, respectively.
1
https://earth.esa.int/web/guest/missions/esa-operational-eo-missions/
envisat
32
SCIAMACHY
MWR
Ka-band
Antenna
GOMOS
DORIS
RA-2 Antenna
X-band
Antenna
LRR
ASAR
Antenna
Service Module
Solar Array (not shown)
33
3. DOMAIN IDENTIFICATION
Radar, sensors, cameras,
instrumentation,
Orchestration and
harmonization of
services and resources
Weather Station
Adaptation layer
(Other systems,
protocols)
Input data
adaptation layer
Knowledge management
platform
Data mining,
ontologies, physical
modelling,
Centralized data
management,
backup,
delivery, etc.
Presentation
layer
Analysis modules
3.1.2.1 Meteorology
Weather analysis combines satellite information with terrestrial instruments typically located in weather stations. The classical instruments such as thermometers,
hygrometers, anemometers, barometers, rain gauges, ceilometers, etc. include
also devices that provide more complex information (Doppler radars, wind profiles). Video cameras are also being used to get extra information. An extensive
analysis of the image data management in the meteorological domain is detailed
in Section 7.5. Due to the fact that the meteorological domain is affordable in
order to be explicitly defined, the image analysis process can be automatically
performed. Figure 3.4 shows an architecture to integrate multimedia information
into a meteorological information management system. The results of a project
for cloudiness estimation are presented in Section 7.1.
34
35
3. DOMAIN IDENTIFICATION
approach does not take into account the frequency relationships among the different coefficients and increases the feature extraction complexity as it requires
the covariance matrix information of all previous samples. Moreover, the feature
relevance of each individual DCT coefficient is too low and also sensitive to noise
and variations.
Li et al. [LZC09] have proposed a generalization of the Radon transform and
trace transform by introducing prior knowledge of specific identification or fingerprinting tasks and extending the geometric sets from straight lines to arbitrary
choices. This approach provides a complete set of resources for non-rigid object identification and has been successfully tested for pedestrian recognition,
segmentation and video retrieval. However, the broad set of configuration parameters and pre-processing tasks are not suitable for domain identification purposes
where the lack of a priori knowledge is one of the main issues.
36
CHAPTER
37
38
image DB
{D}
Sensor modeling
rgb2hsv
load image
rgb2YCbCr
Statistical
descriptors
(, )
Pre-
param.
processing
{I}
Data transformation
Trace
Transform
n , n , n , (L)
{T}
Object extraction
DCT 2
, kurtosis
extraction
{E}
Class assignment
Attribute
Training
selection
Supervised
classifi-
{C}
cation
space dimensionality while preserving essential information in order to allow a good performance in the subsequent classification process. The last
n values from the obtained data pair vector can be disregarded due to the
empirical reason that given the low-pass filtering for most natural images
the DCT concentrates the highest values in the lowest coefficients [BYR10].
Class assignment: vectors obtained in the previous step are processed to
improve the performance of classifiers in the defined feature space. All the
obtained vectors are statistically analyzed to select their most representative attributes. Then the supervised classification process is carried out to
obtain an estimate C of the unknown global image semantic concept C .
39
< i
< j
< k
< l
< m
n cl asses
i N
j N
k N
n (or i g . i mag es) l N
n (or i g . i mag es) m N
(4.2)
p(Tk |E j )p(E j )
p(Tk )
shows that this model layer is linked to the information representativeness of the
extracted features. p(Tk |I l ) implies the trace transform. It is a deterministic process with a slight denoising effect. The quality of data D m and the pre-processed
I l image will be fundamental for an effective feature extraction process. In fact,
the joint inference/estimation process depends on the trace transform which can
be regarded as a data re-orderingcompressionfeature space optimization
process.
40
(0,0)
(4.3)
The trace transform (originally proposed by Fedotov et. al1 [FK95]) consists of
applying a functional along a straight line (L in Figure 4.2). This line is moved
tangentially to a circle of radius covering the set of all tangential lines defined
by . The Radon transform has been used to characterize images [PG92] in well
defined domains [LLL10], in image fingerprinting [SHKY04] and as a primitive
feature for general image description. The trace transform extends the Radon
transform by enabling the definition of the functional and thus enhancing the
control on the feature space. These features can be set up to show scale, rotation/affine transformation invariance or high discriminance for specific content
domains.
The outcome T of the trace transform of a 2D image is another 2D signal
composed by a set of sinusoidal shapes that vary in amplitude, phase, frequency,
intensity and thickness. These sinusoidal signals encode the pre-processed image
1
as a solution of a pattern recognition problem, for the identification of different types of blood
cells, such as erithrocytes. They proposed to convert the image space S in a parameter space, by
intersecting several lines l 0 with S, represented in polar coordinates
41
4.2.2.1 Functionals
A functional of a function (x) evaluated along the line L will have different
properties depending on the features of function (x) (e.g.: invariance to rotation,
translation and scaling[FKT09]). Kadirov et al. [KP06] propose several functionals
with different invariance or sensitiveness properties. These invariant functionals have been used for expert systems for traffic sign recognition [TBFO05], face
authentication[SPKK03, SDH10] or fingerprinting [KP01] purposes. Clearly, the
definition and combination of different Trace functional and Circus functionals
respectively results in different properties of the final descriptor.
Name
Functional
R
(t )d t
R
( |(t )|q d t )r
R
|(t )0 |d t
IF1
IF2
IF3
IF4
R
(t ( t (t )d t /I F 1))2 (t )d t
IF5
(I F 4/I F 1)1/2
IF6
max((t ))
IF7
I F 6 mi n((t ))
IF8
IF9
IF10
IF11
42
(4.4)
min(X , Y )
(4.5)
n =
n =
with X and Y denoting the horizontal and vertical resolutions of the image I l .
Low (n , n , n ) values will have a non-linear downsampling effect on the
original image, where n is defined as:
1
L
n =
(4.6)
The set of points used to evaluate each functional is described (assuming (0,0)
as the center of the image) by:
y = 2 sin()
x
tan()
(4.7)
=0
=
(4.8)
[r, r ] , r = min
"
X
2
Y
2
,
cos() sin()
(4.10)
X
X
2, 2
h
i h
i
3 , 5
,
4 4
4 4
Y
Y
2 , 2
tan() tan()
3 5 , 7
,
4 4
4 4
(4.9)
(4.11)
Equation (4.7) shows a symmetrical result since the same lines are obtained
for [0, ] and [, 2]. However this is only true for functionals that are not
considering the position (like the Radon transform). Depending on the selected
functional and on the desired properties of the trace transform(e.g: rotational
invariance), the ranges of and can be modified to: [0, ] or [0, r ].
43
Figure 4.3: Trace transform contribution mask at very high resolution parameters
(Image resolution:100x100px. n = 1000, n = 1000, n = 5000).
44
(a) Original
(b) (64,64,15)
(c) (64,64,45)
(d) (64,64,185)
(e) (5,300,45)
(f ) (5,300,151)
(g) (300,5,45)
(h) (300,5,151)
Figure 4.4: Pixels relevance in trace transform scanning process with different parameters (n , n , n ). Original image resolution = 384x256.
Ideally, the trace transform should keep the following constraints (considering M as the matrix that contains the number of repetitions of each pixel during
the trace transform):
Coverage: all pixels of the image (including those located at the corners of
the image) have to be included in at least one functional. min(M ) > 0.
Homogeneity: all pixels are used the same number of times. Var(M ) = 0 .
High pixel repetition degree: each pixel has to be included in as many
traces as possible (high values of mean(M )).
Table 4.2 shows some example values for coverage, homogeneity and repetition degree at different n , n , n resolutions. Note that the best ratios are
obtained for lower variations in as the angle is the main factor to increase the
45
n (L)
64
64
15
16.60
0.63
15.71
64
64
45
44.30
1.88
32.72
64
64
85
67.53
3.54
53.61
64
64
185
93.40
7.71
52.51
300
45
28.62
0.69
10.28
300
151
69.84
2.30
31.80
300
45
40.59
0.68
0.20
300
151
88.43
2.30
0.42
300
218
97.34
3.33
0.40
300
251
99.18
3.83
0.30
384
256
15
83.76
15.00
1.2106
100
100
85
85.55
8.65
872.47
100
100
185
98.72
18.82
708.64
100
100
218
99.55
22.18
511.61
100
100
2,185
100.00
222.27
3.6106
42
75
12,000
99.77
384.52
38.6106
used
Mean
Var
variance. The pixel repetition degree is also strongly conditioned by the angular resolution. This fact makes n the main factor to balance the homogeneity
and repetition degree (e.g: low repetition degrees show weaker rotational invariance). Once n is set, n can be adjusted to ensure the optimal coverage. n has
an almost asymptotic behavior once the other two parameters are set and can
be optimized ensuring a minimum pixelwise sampling. However, these different
sampling techniques (e.g.: fixed sampling step or Bresenham algorithm[Bre65])
can also introduce some distortions produced by the different number of samples for each (, ) combination. Figure 4.4 shows some cases applied to a real
image and the convex contribution intensity mask effect for different values of
n .
46
NX
1 1 NX
2 1
k 1 (2n 1 + 1)
k 2 (2n 2 + 1)
X k1 k2 = k1 k2
x n1 n2 cos
cos
2N1
2N2
n 1 =0 n 2 =0
47
(4.12)
Ni
s
Ni
ki = 0
(4.13)
k i 6= 0
50
50
100
100
150
150
200
200
250
(a) Original
50
100
150
200
250
50
100
150
200
Figure 4.5: Trace Transform and subsequent Discrete Cosine Transform of Lenna. (Y
channel of YCbCr color space)
Figure 4.5 shows the process of trace transform evaluation and its 2D DCT
where the intensity is quantized into 6 different levels. The functional used is the
one enumerated by Kadyrov et al. [KP01] as invariant functional IF2 (4.14).
Z
TI F 2 =
|(t )| d t
(4.14)
This functional has invariance properties for independent variable and function
scaling (4.15):
((ax) = (a)((x)) a > 0
(c(x) = (c)((x)) c > 0
(4.15)
where:
(a) = a and (c) = c
48
(4.16)
a 11 a 12 a 13 a 14 a 15
a a a a a
21 22 23 24 25
a a a a a
31 32 33 34 35
a a a a a
41 42 43 44 45
a a a a a
51 52 53 54 55
..
.. . .
.
..
. ..
.
.
.
a a a a a
m1 m2 m3 m4 m5
. . . a 1n
. . . a 2n
. . . a 3n
. . . a 4n
. . . a 5n
..
..
.
.
. . . a mn
(, k)
(, k)
(, k)
(, k)
(, k)
..
(, k)
Figure 4.6: Conceptual scheme: DCT matrix transformation into , k pair vector.
To study these statistical properties, over 50000 sample vectors have been analyzed using the 1000 sample images of Corel 1000 dataset (described in section
4.3.1). The analysis of obtained histograms shows strong leptokurtic distributions for all samples. Equation (4.17) defines the kurtosis of a distribution which
is represented by (4.18) for a discrete set of elements. A distribution is considered leptokurtic when k > 3. For all analyzed distributions the minimum kurtosis
value has been greater than 30. More detailed statistical properties are shown in
Figure 4.7.
k=
1
n
k=
1
n
E (x )4
4
n
P
i =1
n
P
i =1
(4.17)
4
(x i x)
2
(x i x)
49
(4.18)
150
200
50
100
Minimum
Maximum
Mean
Std..deviation
Figure 4.7: Statistical properties of all Kurtosis measurements made on the distributions obtained by processing Corel 1000 dataset
Assuming the leptokurtic nature of the obtained distributions, the list of values can be represented by the mean value and the kurtosis of each vector. This
pair of descriptors (, k) of the first element (corresponding to the DC value of
the DCT) is substituted by the mean and variance of the original image in HSV
space. Considering that the mean and kurtosis values encode the information of
coefficients corresponding to approximately similar frequencies. The obtained
dimensionality of the transformed (, k) pairs is given by (4.19).
nDi ms =
n 2 + n 2 n c n f
(4.19)
where n c is the number of channels of the original image and n f the number
of features extracted from each vector (2 in the case of using [, k]). Thus, the
dimensionality reduction is given by (4.20).
rf = q
n n
n 2 + n 2 n f
(4.20)
rf =
n2
n
p = p
n nf 2 2 2
50
(4.21)
(1 +
xx 0 2
(4.22)
where: x 0 is known as the location parameter and is equal to the median and
represents the scale parameter. Moreover, is equal to the half of the interquartile
range.
Experimental results have demonstrated that the median is not a representative value of the distribution for short vectors. Therefore, the Hodge-Lehman
estimator [HJL63] (Equation 4.23) has been introduced instead of the median
value.
hl (X ) = med i an
xi + x j
2
with 1 i < j n
(4.23)
In general, experiments show similar results for [mean value, kurtosis] and
[hl (X ),
i qr
2 ]
value pairs, while in some cases the assumption of the Cauchy distri-
4.2.4 Classification
After the feature extraction process explained in the previous section, a set of
descriptors E is obtained. The dimensionality of E can be reduced by attribute
selection strategies in order to improve the efficiency of subsequent classification
steps.
51
(FSS) [LM98] approach. FSS can be reformulated as follows: given a set of candidate features, select the best subset in a classification problem. In our case, the
best subset will be the one with the best predictive accuracy.
Most of the supervised learning algorithms perform rather poorly when faced
with many irrelevant or redundant (depending on the specific characteristics of
the classifier) features. In this way, the FSS method proposes additional mechanisms to reduce the number of features so as to improve the performance of the
supervised classification algorithm.
There are two main approaches to tackle the Feature Subset Selection (FSS)
problem from the machine learning point of view, namely wrapper and filter
methods [ILRE00].
Wrapper approaches [BLIS04] try to identify the subset of variables that, given
a classification paradigm and a dataset, provide the best classification function.
The process consists on searching an optimal feature sub-space based on a performance measure (typically the accuracy, though other measures can be used).
Each subset is evaluated by testing the performance of the chosen paradigm in
52
53
(a) Africans
(b) Beach
(c)
Architec-
(d) Buses
(e) Dinosaurs
(i) Mountains
(j) Food
ture
(f ) Elephants
(g) Flowers
(h) Horses
Figure 4.9: Samples of Corel 1000 dataset. The dataset includes 256x384 or 384x256
images.
54
55
FMeasure
75
0.75
0.75
0.75
79
0.752
0.79
0.771
78
0.772
0.78
0.776
81
0.9
0.81
0.853
100 0
0.98
0.99
83
0.806
0.83
0.818
95
0.941
0.95
0.945
97
0.942
0.97
0.956
14
78
0.813
0.78
0.796
82
0.828
0.82
0.824
0.848
0.848
0.848
Average
Mountains
Architecture
Beach
Buses
Horses
Elephants
Food
Africans
Flowers
Dinosaurs
Figure 4.10: Distance among classes in the Corel 1000 dataset according to misclassified instances.
56
Buses
Food
Africans
Elephants
Architecture
Beach Mountains
Figure 4.11: Distance among most inter-related classes in the Corel 1000 dataset
according to misclassified instances.
sian Nave Bayesian Network [BKB10]), DITEC shows the best performance for
most categories (Figure 4.13) and the highest mean precision value. Other performance parameters (such as recall, FMeasure) have not been compared since
they have not be indicated in the papers related with the rest of the methods.
57
Figure 4.12: Corel 1000 picture corresponding to class Architecture and classified as
Average
Food
Mountain
Horses
Flowers
Elephants
Dinosaurs
Buses
Architecture
Beach
Africa
Precision %
Mountain
Figure 4.13: Corel 1000 precision results with different feature extraction algorithms.
WHMSGM: Mean-Shift and Gaussian Mixtures based on Weighted Color Histograms,
FVR: Reduced Feature Vector with Relevance Feedback, Gaussian NBN: SIFT based
Gaussian Nave Bayesian Network.
During the data mining process Bayesian networks provide the best performance, reaching an accuracy of 94.51% in a k-fold 10 test. The final dimensionality
of the feature space has been reduced to 61 attributes. Table 4.4 shows the confusion matrix of the classification results.
Applying the Force Atlas 2 method to Geoeye classification errors, we obtain the distribution shown in Figure 4.15. It can be observed that Risalpur and
Rome are the categories with the highest mutual similarity (2 cities). The Davis-
58
(a) Athens
(b) Davis
(e) Nyragongo
(c) Manama
(f ) Risalpur
(d) Midway
(g) Rome
Table 4.4: Geoeye dataset confusion matrix. Ground truth represented in rows,
predicted labels in columns. Labels correspond to the assignment in Figure 4.9.
pr eci si onr ecal l
(a) Athens
74
0.961
0.961
0.961
(b) Davis
183 0
0.943
0.971
(c) Manama
193 0
0.97
0.995
0.982
(d) Midway
62
0.954
0.976
(e) Nyragongo
77
0.939
0.906
0.922
(f) Risalpur
177 17
0.898
0.912
0.905
(g) Rome
11 182
0.897
0.938
0.917
0.946
0.845
0.945
Average
Precision Recall
Measure
Monthan aircraft boneyard has shown a remarkable similarity with Risalpur due
to the fact that wide areas of bare soil are a common element in both Risalpur
59
Athens
Manama
Nyragongo
Rome
Risalpur
Davis
Figure 4.15: Distance among classes in the Geoeye dataset according to misclassified
instances.
and Davis.
The Midway atoll is the most distinguishable category of the Geoeye dataset.
It contains special color, texture and shapes that make it singular within the
60
61
50
Time (ms)
40
30
20
10
20
40
60
80
100
120
140
160
180
200
220
240
260
280
300
260
280
300
(a) n ( L)
4000
Time (ms)
3000
2000
1000
20
40
60
80
100
120
140
160
180
200
220
240
(b) n = n
Figure 4.16: Time performance behavior depending on applied sampling parameters. Experiments have been carried out using a computer with an Intel Core
i7-740QM processor at 1.73GHz and 8GB RAM
62
4.4.3 Scalability
Each of the datasets used during the validation process contains around 1,000
items. The scalability of the presented framework to larger datasets can be carried out by parallelizing the most critical part of the process which is the trace
transform. In fact, the calculation of each functional can be independently executed as it is demonstrated by Meena et al. [MPL11] where an implementation
on FPGA of the trace transform operator obtained a throughput of 2725 images
per second.
63
64
refererence
image
image
input
input
load image
load image
rgb2gray
rgb2gray
Interest
point
Interest
param.
extraction
point
extraction
n , n , n , (L),
Trace
normal-
Trace
Transform
ize circular
Transform
patch
DFT-1 rows
DFT-1 rows
n first
module
n first
Metrics
coefficients
module
coefficients
Pairing
approach similar to SIFT [Low99] for scale normalization. After scale normalization we proceed to a histogram equalization in order to normalize the dynamic
range of the patch. This normalization improves the performance of the descriptor against light intensity photometric transformation. It is worth mentioning
than many functionals, such as integral function used in Radon transform, are
not invariant to exposure or light intensity photometric transformation.
Once the patch is normalized, the trace transform is applied within a circular
area contained within the obtained rectangular patch. This process improves the
rotational invariance since no new elements are introduced or lost in the area
when the image is rotated. The use of circular patch is also applied in other similar approaches like ORB([RRKB11]) where a mask is used for central moments
65
C (x, y)(,)
x 2 + y 2 = R 2
y = tan () x +
sin
(4.25)
i qr
2 , median and Hodge-Lehman estimator).
sionality reduction is not a critical factor in this case as the number of elements
in the trace transform resulting matrix is much lower.
66
NX
1
T(k) e
i 2kn
N
(4.26)
k=0
As the result of the DFT belongs to the complex space mat hbbC the obtained
result is split in its phase and magnitude components. The horizontal representation of the phase contains the information related to the orientation of the
image, and thus, the magnitude is normalized with respect to the rotation. This
phase-normalized signal characterization will be the set of coefficients that will
compose the descriptor.
Finally the descriptor is constructed by taking the n first magnitude coefficients of each DFT vector of the trace transforms rows. To have more control over
the length of the final descriptor and the number of coefficients taken in each
row, the trace transform image T can be vertically down-sampled, creating in this
way bands instead of single rows. By avoiding the inclusion of the DC value of the
DFT (first element), we obtain a stronger luminosity invariance.
67
% Correct Matches
90
80
70
60
50
40
30
20
10
0
10
15
20
25
30
35
40
45
50
Phi Value
Figure 4.18: Matching accuracy depending on the number of samples. Experiments have been performed for 6 different image sub datasets of 5 to 10 images each,
covering different image sizes, as well as geometric and photometric transformations, such as image translation, rotation, and projection, or image blurring, noise,
or light exposure changes. n = 16, patch sizes equal to 20, descriptor dimensionality
equal to 128 and sampling strategy based on a single rotation approach.
Regarding the radial sampling, experimental results have shown that the accuracy convergence starts at around 15 samples. The similarity of the values for
68
% Correct Matches
90
80
70
60
50
40
30
20
10
0
10
15
20
25
30
35
40
45
50
Rho Value
Figure 4.19: Matching accuracy depending on the number of samples. Experiments have been performed for 6 different image sub datasets of 5 to 10 images each,
covering different image sizes, as well as geometric and photometric transformation,
such as image translation, rotation, and projection, or image blurring, noise, or light
exposure changes. n = 16, patch sizes equal to 20, descriptor dimensionality equal
to 128 and sampling strategy based on two single rotation approach.
80
70
60
50
40
30
20
10
0
10
15
20
25
30
35
40
45
50
We also conducted an experiment where n and n are changed simultaneously, and in the same quantity. Figure 4.20 shows that convergence is reached
69
1.2
1
0.8
0.6
0.4
0.2
0
10
15
20
25
30
35
40
45
50
70
SIFT
SURF
DAISY
BRISK
ORB
BRIEF
DITEC
FREAK
% Correct Matches
0.8
0.6
0.4
0.2
50
100
150
250
200
300
350
400
Rotation Angle
SIFT
SURF
DAISY
BRISK
ORB
BRIEF
DITEC
FREAK
% Correct Matches
0.8
0.6
0.4
0.2
0
0
0.5
1.5
Scale Value
2.5
Figure 4.23 shows the results obtained in the evaluation of scale transformation of an input image. In this case, even if DITEC shows a very good performance,
71
SIFT
SURF
DAISY
BRISK
ORB
BRIEF
DITEC
FREAK
0.9
% Correct Matches
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1.5
2.5
Image
3.5
72
SIFT
SURF
DAISY
BRISK
ORB
BRIEF
DITEC
FREAK
% Correct Matches
0.8
0.6
0.4
0.2
0.5
1.5
2.5
3.5
4.5
consider that an effective characterization of this radial data can provide both, a
scale estimation method and a descriptor with a higher scale invariance. Figure
4.26 shows the relationship of the trace transforms rows and columns with the
angular and radial representation in the original image.
DFT (scale invariance)
t 11 t 12 t 13 t 14 t 15 . . . t 1n
t
21
t
31
41
51
..
t
m1
t 22 t 23 t 24 t 25 . . . t 2n
t 32 t 33 t 34 t 35 . . . t 3n
t 42 t 43 t 44 t 45 . . . t 4n
t 52 t 53 t 54 t 55 . . . t 5n
..
..
..
.. . .
.
. ..
.
.
.
.
t m2 t m3 t m4 t m5 . . . t mn
73
Both optimists and pessimists contribute to society. The optimist invents the aeroplane, the pessimist
the parachute.
George Bernard Shaw
CHAPTER
Main Contributions
The main contributions of this research work are described in this section.
Most of these contributions have also been presented in journals and conferences
as it can be seen in the Section II second part of this document. Moreover, some
of the technological results derived from the R&D actvitiy have been applied for
patents (see Section 8). The single camera ball tracking system has already been
accepted and the local DITEC method is in process at this moment.
75
5. MAIN CONTRIBUTIONS
obtained coefficients. Experimental results have shown very good results in terms
of accuracy and robustness (see Section 4).
76
CHAPTER
77
78
79
Part II
Patents & Publications
81
CHAPTER
Publications
7.1 Weather analysis system based on sky images
taken from the earth
Title: Weather Analysis System Based on Sky Images Taken from the Earth
Authors:Mikel Labayen and Naiara Aginako and Igor Garca
Booktitle: Proceedings of VIE 2008 - The fifth International Conference on
Visual Information Engineering.
Conference Location: Xian (China)
Year: 2008
DOI: http://dx.doi.org/10.1049/cp:20080299
83
7. PUBLICATIONS
7.4 DITEC: Experimental analysis of an image characterization method based on the trace transform
Title: DITEC: Experimental analysis of an image characterization method
based on the trace transform
Authors: Igor Garca Olaizola, Iigo Barandiaran, Basilio Sierra, Manuel
Graa
Conference: VISAPP 2013, 9th International Conference on Computer Vision Theory and Applications.
Conference Location: Barcelona (Spain)
Year: 2013
URL: http://www.visapp.visigrapp.org/?y=2013
84
7.7 Trace transform based method for color image domain identification
Title: Trace transform based method for color image domain identification
Authors:Igor Garca Olaizola, Marco Quartulli, Basilio Sierra, Julin Flrez
Journal: IEEE Transactions on Multimedia
Status: Under review after major changes.
7.8 On the Image Content of the ESA EUSC JRC Workshop on Image Information Mining
Title: On the Image Content of the ESA EUSC JRC Workshop on Image
Information Mining
Authors: Marco Quartulli, Igor Garca Olaizola, Mikel Zorrilla
Proceedings: Proceedings of ESA-EUSC-JRC 8th Conference on Image
Information Mining: Knowledge Discovery from Earth Observation Data
Pages: 70-73
85
7. PUBLICATIONS
Publisher: JRC, Joint Research Center (European Commission)
Year: 2012
DOI: http://dx.doi.org/10.2788/49465
2.
Title: Visual processing of geographic and environmental information in the basque country: two basque case studies
Authors: Alvaro Segura, Aitor Moreno, Igor Garca, Naiara Aginako,
Mikel Labayen, Jorge Posada, Jose Antonio Aranda, Rubn Garca De
Andoin
86
3.
87
7. PUBLICATIONS
standards such as HTML5 and WebGL removing limitations, and transforming the Web into a horizontal application framework to tackle
interoperability over the heterogeneous digital home platforms. Developers can apply their knowledge of web-based solutions to design
digital home applications, removing learning curve barriers related to
platform-specific APIs. However, constraints to render complex 3D environments are still present especially in home media devices. This paper
provides a state-of-the-art survey of current capabilities and limitations
of the digital home devices and describes a latency-driven system design
based on hybrid remote and local rendering architecture, enhancing
the interactive experience of 3D graphics on these thin devices. It supports interactive navigation of sophisticated 3D scenes while provides
an interoperable solution that can be deployed over the wide digital
home device landscape.
4.
88
5.
Title:Ontology Based Middleware for Ranking and Retrieving Information on Locations Adapted for People with Special Needs
Authors: Kevin Alonso, Naiara Aginako, Javier Lozano, Igor Garca
Olaizola
Journal: Lecture Notes in Computer Science, Computers Helping
People with Special Needs
Volume: 7382
Pages: 351-354
Publisher: Springer
Year: 2012
DOI: http://dx.doi.org/10.1007/978-3-642-31522-0_53
Abstract: Current leisure or touristic services searching tools do not
take into account the special needs of large amount of people with
functional diversities. However, the combination of different semantic,
web and storage technologies make possible the enhancement of such
search tools, allowing more personalized searches. This contributes to
the provision of better and more suitable results. In this paper we propose an innovative ontology driven solution for personalized tourism
directed to people with special needs.
6.
89
7. PUBLICATIONS
Publisher: Springer
Year: 2009
DOI: http://dx.doi.org/10.1109/SMAP.2009.16
Abstract: This article presents the motivation and the implementation
of a semantic model developed to support diverse semantic services in
a multimedia asset management system in a broadcaster. The model
is mainly driven by DMS-1 (descriptive metadata scheme) standard,
which is part of the multimedia exchange format standard defined by
the broadcast industrial community and according to our knowledge
we propose the first implementation of it using the OWL language. This
model has been complemented with other models coming from the
academia in order to cover the diverse nature of the different semantic
needs identified in the whole workflow.
7.
90
9.
91
7. PUBLICATIONS
potential of interactive Television (iTV) as a multimedia and entertainment platform is enormous. The existing gap between PC world and
iTV concerning graphics capabilities, may restrain the development
of iTV platform in favour of the former one. Support for 3D graphics applications in iTV would boost this new platform with plenty of
possibilities to be exploited.
92
CHAPTER
Selected Patents
8.1 Method for detecting the point of impact of a ball
in sports events
93
8. SELECTED PATENTS
Application number: EP20090382086
Publication date: Oct 17, 2012
Filing date: Jun 2, 2009
Priority date: Jun 2, 2009
Also published as: EP2259207A1, EP2259207B8
Inventors: Bengoa Naiara Aginako, Olaizola Igor Garcia, Esnaola Mikel
Labayen
Applicant: Vicomtech-Visual Interaction and Communication Technologies Center
94
Part III
Appendix and Bibliography
95
APPENDIX
A
Consideration on the
Implementation Aspects of
the trace transform
A.1 Development platforms
The first implementation of the trace transform has been developed in Octave/Matlab. Both platforms provide the Radon transform as a built in function but do
not include the generalization to other functionals. First implementations based
on a double loop and a scanline function where too slow for real applications.
Therefore, the algorithm was transformed to matrix operations that allowed us to
remove one of the loops (radial scanning). This approach improved the performance in more than 10 times since all radial samples of a specific angle where
calculated by the same matrix operations.
In order to speed up the trace transform operator and also to integrate this
function in the local feature descriptor evaluation platform (which is based on
C++), a C++ implementation has been developed. The C++/OpenCV1 version includes different approaches for rotation and sampling that make it much more
flexible for different requirements such as speed, quality and control over the
distortions produced by each different approach.
1
http://opencv.org/
97
A.2 Sampling
The sampling process is one of the key aspects of the discrete trace transform
operation. The sampling is determined by 3 parameters , and (L) as described in Section 4.2.2.2. Aliasing and distortion effects strongly depend on these
parameters.
and determine the two main loops of the trace transform calculation.
When translated to matrix operations (in Octave/Matlab), the radial loop () is
performed by creating a set of clipping points that define the scanlines that will
be performed. It implies that all scanlines have to be included in the same matrix
and therefore must have the same length (key aspect for performance). However, this process introduces some distortions due to the fact that short scanlines
include the same number of samples as the longer ones.
98
A.2 Sampling
The C++ implementation avoids this limitations and 3 different strategies are
followed to perform the scanline sampling.
1. Fixed step:
2. Fixed number of samples: as it is done in the Octave/Matlab implementation
3. Bresenham algorithm [Bre65]
The Bresenham implementation of OpenCV performs as the fastest option for scanline sampling. However, it introduces a distortion at around = (0, 2 , , 3
).
2
This is produced when the scanline becomes vertical or
horizontal and thus the number of required neighbor
pixels is lower. In order to appreciate this effect, we can
R
apply the trace transform with functional (t )d t (with
a circular patch) to a homogeneous white image (Figure
patch image
degradation of the signal (produced by the shorter scanlines) that can be considered as inherent to the algorithm, but there is also a
decrease of the signal in the vertical and horizontal limits.
In order to minimize this effect and still keep the performance benefits of the
Bresenham algorithm, we have implemented a variant based on a
image rota-
tion. The goal of this rotation is to move the horizontal and vertical scanlines to a
position where Bresenham algorithm will include neighbor values in the same
way as they are included in the rest of the regions.
99
Figure A.4: First half of the source image is sampled (blue regions) while areas
around vertical and horizontal axes are not considered.
Figure A.5: Second half of the source image is sampled (red and green). These re 3 5 7
4 , 4 , 4 , areas in order to be sampled with the Bresenham
Figure A.6 shows the result of applying the same functional to the same image
but making a single rotation of ( 4 ). As it can be observed, the number of areas
with angular distortion is the double of the previous approach (each
n
).
4
How-
ever, the gradients observed in these distortions are much smoother than the
previous ones.
Extending the idea of rotating the image for accounting the differences in
100
A.2 Sampling
Figure A.6: Result of (, ) sampling with Bresenham algorithm and a single image
rotation
Figure A.7: Result of (, ) pixelwise sampling with image rotation for each angular
iteration.
101
200
150
100
50
0
0
50
100
150
200
250
For full rotation based method, it can be considered that there is no distortion
(the observed minor variations are basically due to numeric errors during the
rotation).
102
APPENDIX
B
Calculation of the clipping
points in a circular region
The circular region is defined as a circumference contained within the rectangular patch. Therefore, the radius of the circumference will be equivalent to the
minimum of the patch axes.
In order to simplify the calculation, we will locate the (0, 0) coordinate at the
center of the patch. Once the clipping points are found, a mere translation of the
center will be enough to get the real clipping point positions.
The equation system that has to be solved is composed of the aforementioned
circumference and a straight line defined as the orthogonal to the straight line
defined by the center of the image and the , position.
(
C (x, y)(,)
x2 + y 2 = R2
y = ax + b
(B.1)
x 2 + (ax + b)2 = R 2
(B.2)
(a 2 + 1)x 2 + 2abx + (b 2 R 2 ) = 0
(B.3)
a 2 b 2 (a 2 + 1)(b 2 R 2 )
a2 + 1
We can simplify the Equation B.4 to:
p
ab R 2 (a 2 + 1) b 2
x=
a2 + 1
x=
ab
103
(B.4)
(B.5)
cos
tan
sin
= 2
a = tan =
1
tan
1
tan
cos
b = sin +
=
tan sin
a = =
(B.6)
(B.7)
tan sin
R2
C (, ) =
1
tan
1
tan
2
+ 1 sin
(B.8)
+1
1
tan
2
R2
2
+1
= R2 +
=
sin
tan2 sin2
R 2 tan2 + R 2 cos2
tan2
R sin + R cos2 2
2
sin2
104
R 2 2
sin2
(B.9)
C (, ) =
tan sin
1
tan
tan R 2 2
tan sin
R 2 2
sin2
+1
tan sin
1
tan
R 2 2
sin
+1
p
tan R 2 2
=
=
=
2
2
1
1
+ 1 tan sin
tan + 1
tan
p
p
tan R 2 2 tan R 2 2
=
=
=
sin
sin2
+
tan
sin
cos
+
tan
cos
p
q
tan R 2 2
= cos sin R 2 2
=
1
cos
105
(B.10)
Bibliography
[AKS11] Sultan Ahmed, Md. Khan, and Md. Shahjahan. A filter based feature selection approach using lempel ziv complexity. In Derong Liu,
Huaguang Zhang, Marios Polycarpou, Cesare Alippi, and Haibo He,
editors, Advances in Neural Networks ISNN 2011, volume 6676 of
Lecture Notes in Computer Science, pages 260269. Springer Berlin /
Heidelberg, 2011. 10.1007/978-3-642-21090-7_31. 35
[And76] James Richard Anderson. A land use and land cover classification system for use with remote sensor data, volume 964. US Government
Printing Office, 1976. 31
[ANR74] N. Ahmed, T. Natarajan, and K. R. Rao. Discrete cosine transform.
IEEE-J-C, (1):9093, 1974. 47
[Bar13] Inigo Barandiaran. Contributions to Local Feature Extraction, Description and Matching in 2D Images. PhD thesis, Department
of Computer Science and Artificial Intelligence , University of the
Basque Country, 2013. 5, 70
[BB08] P. Brasnett and M. Bober. Fast and robust image identification. In
Proc. 19th Int. Conf. Pattern Recognition ICPR 2008, pages 15, 2008.
41, 47
[BCN+ 13] I. Barandiaran, C. Cortes, M. Nieto, M. Grana, and O.E. Ruiz. A new
evaluation framework and image dataset for key point extraction and
feature descriptor matching. In VISAPP 2013 - International Conference on Computer Vision Theory and Applications, pages 252 257.
Scitepress, 2013. 70
107
BIBLIOGRAPHY
[BH11] M. A. Bouker and E. Hervet. Retrieval of images using mean-shift and
gaussian mixtures based on weighted color histograms. In Proc. Seventh Int Signal-Image Technology and Internet-Based Systems (SITIS)
Conf, pages 218222, 2011. 35, 56
[BKB10] W. Bouachir, M. Kardouchi, and N. Belacel. Fuzzy indexing for bag of
features scene categorization. In Proc. 5th Int I/V Communications
and Mobile Network (ISVC) Symp, pages 14, 2010. 57
[BLIS04] R. Blanco, P. Larraaga, I. Inza, and B. Sierra. Gene selection for cancer classification using wrapper approaches. International Journal of
Pattern Recognition and Artificial Intelligence, 2004. 52
[BO07] M Bober and R Oami. Description of mpeg-7 visual core experiments.
Technical report, ISO/IEC JTC1/SC29/WG11, 2007. 35, 47
[Bre65] Jack E Bresenham. Algorithm for computer control of a digital plotter.
IBM Systems journal, 4(1):2530, 1965. 46, 99
[BTG06] Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. Surf: Speeded up
robust features. In In ECCV, pages 404417, 2006. 34
[BYR10] Vladimir Britanak, Patrick C. Yip, and K. R Rao. Discrete Cosine and
Sine Transforms: General Properties, Fast Algorithms and Integer Approximations. Academic Press, 2010. 39, 47
[Cas11] Stephen Cass. Unthinking machines. Technical report, MIT Technology Review, 2011. 4
[CLTW10] Myung Jin Choi, J. J. Lim, A. Torralba, and A. S. Willsky. Exploiting
hierarchical context on a large database of object categories. In Proc.
IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pages
129136, 2010. 35
[CMGD10] D. Cerra, A. Mallet, L. Gueguen, and M. Datcu. Algorithmic information theory-based analysis of earth observation images: An
assessment. IEEE_J_GRSL, 7(1):812, 2010. 35
[Cor] Corel.
related.shtml). 17, 54
108
(http://wang.ist.psu.edu/docs/
BIBLIOGRAPHY
[DBLFF10] Jia Deng, Alexander C Berg, Kai Li, and Li Fei-Fei. What does classifying more than 10,000 image categories tell us?
In Computer
us/en/icons/deepblue/. 27
[DF02] Gerald Dalley and Patrick Flynn. Pair-wise range image registration: A study in outlier classification. Computer Vision and Image
Understanding, 87(1-3):104 115, 2002. 53
[DLS11] F. Dornaika, E. Lazkano, and B. Sierra. Improving dynamic facial
expression recognition with feature subset selection. Pattern Recognition Letters, 32(5):740 748, 2011. 53
[Fah06] S. A. Fahmy. Investigating trace transform architectures for face
authentication. In Proc. Int. Conf. Field Programmable Logic and
Applications FPL 06, pages 12, 2006. 35, 61
[FBCC+ 10] David Ferrucci, Eric Brown, Jennifer Chu-Carroll, James Fan, David
Gondek, Aditya A Kalyanpur, Adam Lally, J William Murdock, Eric Nyberg, John Prager, et al. Building watson: An overview of the deepqa
project. AI magazine, 31(3):5979, 2010. 28
[FK95] NG Fedotov and Alexander A Kadyrov. Image scanning in machine
vision leads to new understanding of image. In Digital Image Processing and Computer Graphics: Fifth International Workshop, pages
256261. International Society for Optics and Photonics, 1995. 41
[FKT09] Rerkchai Fooprateepsiri, Werasak Kurutach, and Sutthipong Tamsumpaolerd. An image identifier based on hausdorff shape trace
transform. In Proceedings of the 16th International Conference on
Neural Information Processing: Part I, ICONIP 09, pages 788797,
Berlin, Heidelberg, 2009. Springer-Verlag. 42
[Glo] Digital Globe. Geoeye dataset. http://www.geoeye.com. 17, 57
[Haa11] Peter J. Haas. Sketches get sketchier. Commun. ACM, 54:100100,
August 2011. 37
109
BIBLIOGRAPHY
[HJL63] Joseph L Hodges Jr and Erich L Lehmann. Estimates of location based
on rank tests. The Annals of Mathematical Statistics, pages 598611,
1963. 51
[HSL+ 06] Jonathon S. Hare, Patrick A. S. Sinclair, Paul H. Lewis, Kirk Martinez,
Peter G.B. Enser, and Christine J. Sandom. Bridging the semantic gap
in multimedia information retrieval: Top-down and bottom-up approaches. In Paolo Bouquet, Roberto Brunelli, Jean-Pierre Chanod,
Claudia Niedere, and Heiko Stoermer, editors, Mastering the Gap:
From Information Extraction to Semantic Representation / 3rd European Semantic Web Conference, 2006. Event Dates: 12 June 2006.
25
[ILRE00] I. Inza, P. Larraaga, and B. Sierra R. Etxeberria. Feature subset selection by Bayesian networks based optimization. Artificial Intelligence,
123(12):157184, 2000. 52
[JHVB11] Mathieu Jacomy, Sebastien Heymann, Tomaso Venturini, and Mathieu Bastian. Forceatlas2, a graph layout algorithm for handy network
visualization. Draft, Gephi Web Atlas, 2011. 55
[KBBN11] Neeraj Kumar, Alexander Berg, Peter N Belhumeur, and Shree Nayar. Describable visual attributes for face verification and image
search. Pattern Analysis and Machine Intelligence, IEEE Transactions
on, 33(10):19621977, 2011. 53
[KP98] A. Kadyrov and M. Petrou. The trace transform as a tool to invariant feature construction. In Proc. Fourteenth Int Pattern Recognition
Conf, volume 2, pages 10371039, 1998. 18, 41
[KP01] A. Kadyrov and M. Petrou. The trace transform and its applications.
IEEE J PAMI, 23(8):811828, 2001. xxi, 18, 35, 41, 42, 48, 61
[KP06] A. Kadyrov and M. Petrou. Affine parameter estimation from the trace
transform. IEEE J PAMI, 28(10):16311645, 2006. 18, 42, 47, 49
[KSt] Kolmogorov-smirnov test.
Encyclopedia of Mathematics
http://www.encyclopediaofmath.org/index.php?title=
Kolmogorov-Smirnov_test&oldid=22659. 51
110
BIBLIOGRAPHY
[KYDN11] Roger King, Nicolas Younan, Mihai Datcu, and Ion Nedelcu. Innovative data mining techniques in support of geoss: A workshops
findings. In Space Technology (ICST), 2011 2nd International Conference on, pages 14. IEEE, 2011. 31
[LCS11] S. Leutenegger, M. Chli, and R.Y. Siegwart. Brisk: Binary robust invariant scalable keypoints. In Computer Vision (ICCV), 2011 IEEE
International Conference on, pages 25482555. IEEE, 2011. 64
[Lev13] Hector J. Levesque. On our best behaviour. International Joint Conference on Artificial Intelligence, IJCAI, 2013. 3
[LK10] Ping Li and Christian Knig. b-bit minwise hashing. In Proceedings
of the 19th international conference on World wide web, WWW 10,
pages 671680, New York, NY, USA, 2010. ACM. 37
[LLL10] Shuyang Lin, Shengrui Li, and Cuihua Li. A fast electronic components orientation and identify method via radon transform. In Proc.
IEEE Int Systems Man and Cybernetics (SMC) Conf, pages 39023908,
2010. 41
[LM98] H. Liu and H. Motoda. Feature Selection for Knowledge Discovery and
Data Mining. Kluwer Academic Publishers, 1998. 52
[Low99] David G. Lowe. Object recognition from local Scale-Invariant features.
Computer Vision, IEEE International Conference on, 2:11501157
vol.2, August 1999. 27, 34, 64, 65
[LW07] Nan Liu and Han Wang. Recognition of human faces using siscrete
cosine transform filtered trace features. In Proc. 6th Int Information,
Communications & Signal Processing Conf, pages 15, 2007. 35
[LW09] Nan Liu and Han Wang. Modeling images with multiple trace transforms for pattern analysis. IEEE J SPL, 16(5):394397, 2009. 35
[LZC09] Jian Li, Shaohua Kevin Zhou, and Rama Chellappa. Appearance
modeling using a geometric transform. IEEE Trans Image Process,
18(4):889902, Apr 2009. 36
111
BIBLIOGRAPHY
[MAMD09] M. R. Mustaffa, F. Ahmad, R. Mahmod, and S. Doraisamy. Generalized ridgelet-fourier for mxn images: Determining the normalization
criteria. In Proc. IEEE Int Signal and Image Processing Applications
(ICSIPA) Conf, pages 380384, 2009. 35
[MAMD10] M. R. Mustaffa, F. Ahmad, R. Mahmod, and S. Doraisamy. Invariant
generalised ridgelet-fourier for shape-based image retrieval. In Proc.
Int Information Retrieval & Knowledge Management, (CAMP) Conf,
pages 7984, 2010. 35
[Mar11] Gorka Marcos. A Semantic Middleware to enhance current Multimedia Retrieval Systems with Content-based functionalities. PhD thesis,
University of the Basque Country, Computer Science Faculty, Computer Languages and Systems Deparment, Donostia - San Sebastian,
2011. 5, 22
[MIOF11] Gorka Marcos, Arantza Illarramendi, Igor G. Olaizola, and Julian Florez. A middleware to enhance current multimedia retrieval systems
with content-based functionalities. Multimedia Systems, 17(2):149
164, 2011. 22
[Mit97] T.M. Mitchell. Machine Learning. McGraw Hill, 1997. 51
[MLH03] D. Meyer, F. Leisch, , and K. Hortnik. The support vector machine
under test. Neurocomputing, 55:169186, 2003. 53
[MPE04] Mpeg-7 overview, October 2004. 35
[MPL11] M. Meena, K. Pramod, and K. Linganagouda. Optimized trace transform based feature extraction architecture for cbir. In Ajith Abraham, JaimeLloret Mauri, JohnF. Buford, Junichi Suzuki, and SabuM.
Thampi, editors, Advances in Computing and Communications, volume 192 of Communications in Computer and Information Science,
pages 444451. Springer Berlin Heidelberg, 2011. 63
[MS02] K. Mikolajczyk and C. Schmid. An affine invariant interest point
detector. Computer Vision,ECCV 2002, pages 128142, 2002. 72
[NC11] H. Nemmour and Y. Chibani. Handwritten arabic word recognition
based on ridgelet transform and support vector machines. In High
112
BIBLIOGRAPHY
Performance Computing and Simulation (HPCS), 2011 International
Conference on, pages 357 361, july 2011. 35
[NPK10] M. F. Nasrudin, M. Petrou, and L. Kotoulas. Jawi character recognition
using the trace transform. In Proc. Seventh Int Computer Graphics,
Imaging and Visualization (CGIV) Conf, pages 151156, 2010. 35
[OAL09] I. G. Olaizola, N. Aginako, and M. Labayen. Image analysis platform
for data management in the meteorological domain. In Proc. 4th Int.
Workshop Semantic Media Adaptation and Personalization SMAP 09,
pages 8994, 2009. 34
[OBO08] R. OCallaghan, M. Bober, and P. Oami, R. and. Brasnett. Information technology - multimedia content description interface - part 3:
Visual, amendment 3: Image signature tools, 01 2008. 35
[OMK+ 09] I. G. Olaizola, G. Marcos, P. Kramer, J. Florez, and B. Sierra. Architecture for semi-automatic multimedia analysis by hypothesis
reinforcement. In Proc. IEEE Int. Symp. Broadband Multimedia Systems and Broadcasting BMSB 09, pages 16, 2009. 14, 23, 24, 35
[OT01] Aude Oliva and Antonio Torralba. Modeling the shape of the scene: A
holistic representation of the spatial envelope. International Journal
of Computer Vision, 42:145175, 2001. 27, 35
[PG92] F. Peyrin and R. Goutte. Image invariant via the radon transform. In
Proc. Int Image Processing and its Applications Conf, pages 458461,
1992. 41
[PK04] M. Petrou and A. Kadyrov. Affine invariant features from the trace
transform. IEEE_J_PAMI, 26(1):3044, 2004. 41, 47
[Poy96] C.A. Poynton. A technical introduction to digital video. J. Wiley, 1996.
40, 55
[Poy03] Charles Poynton. Digital video and HDTV, algorithms and interfaces.
Morgan Kaufmann, 2003. 55
[QGO13] Marco Quartulli and Igor G Olaizola. A review of eo image information mining. ISPRS Journal of Photogrammetry and Remote Sensing,
75:1128, 2013. 18
113
BIBLIOGRAPHY
[Ric02] Iain E. Richardson. Video Codec Design: Developing Image and Video
Compression Systems. John Wiley & Sons, Inc., New York, NY, USA,
2002. 47
[RRKB11] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski. Orb: an efficient
alternative to sift or surf. In Computer Vision (ICCV), 2011 IEEE
International Conference on, pages 25642571. IEEE, 2011. 65
[RVG+ 07] A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S. Belongie. Objects in context. In Proc. IEEE 11th Int. Conf. Computer
Vision ICCV 2007, pages 18, 2007. 35
[SASK08] N. Simou, Th. Athanasiadis, G. Stoilos, and S. Kollias. Image indexing
and retrieval using expressive fuzzy description logics. Signal, Image
and Video Processing, 2:321335, 2008. 23
[SDH10] Zhan Shi, Minghui Du, and Rongbing Huang. A trace transform
based on subspace method for face recognition. In Proc. Int Computer Application and System Modeling (ICCASM) Conf, volume 13,
2010. 42
[SF91] Thomas M. Strat and Martin A. Fischler. Context-based vision:
recognizing objects using information from both 2 d and 3 d imagery. IEEE Transactions on Pattern Analysis and Machine Intelligence,
13(10):10501065, 1991. 29
[SHKY04] Jin S. Seo, Jaap Haitsma, Ton Kalker, and Chang Dong Yoo. A robust
image fingerprinting system using the radon transform. Sig. Proc.:
Image Comm., 19(4):325339, 2004. 41
[SI07] E. Shechtman and M. Irani. Matching local self-similarities across
images and videos. In Proc. IEEE Conf. Computer Vision and Pattern
Recognition CVPR 07, pages 18, 2007. 35
[SKBB12] W.J. Scheirer, N. Kumar, P.N. Belhumeur, and T.E. Boult. Multiattribute spaces: Calibration for attribute fusion and similarity search.
In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 29332940, 2012. 35
114
BIBLIOGRAPHY
[SLJI09] B. Sierra, E. Lazkano, E. Jauregi, and I. Irigoien. Histogram distancebased bayesian network structure learning: A supervised classification specific approach. Decision Support Systems, 48(1):180190,
2009. 53
[SPKK03] S. Srisuk, M. Petrou, W. Kurutach, and A. Kadyrov. Face authentication using the trace transform. In Proc. IEEE Computer Society Conf.
Computer Vision and Pattern Recognition, volume 1, 2003. 35, 42
[SS10] Cees G. M. Snoek and Arnold W. M. Smeulders. Visual-concept search
solved? Computer, 43(6):7678, 2010. 34
[SWS+ 00] Arnold W. M. Smeulders, Marcel Worring, Simone Santini, Amarnath Gupta, and Ramesh Jain. Content-based image retrieval at
the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell.,
22(12):13491380, December 2000. 29, 31
[TBFO05] J. Turan, Z. Bojkovic, P. Filo, and L. Ovsenik. Invariant image recognition experiment with trace transform. In Proc. 7th Int Telecommunications in Modern Satellite, Cable and Broadcasting Services Conf,
volume 1, pages 189192, 2005. 35, 41, 42
[TFW08] A. Torralba, R. Fergus, and Y. Weiss. Small codes and large image
databases for recognition. In Proc. IEEE Conf. Computer Vision and
Pattern Recognition CVPR 2008, pages 18, 2008. 35
[TMF10] A. Torralba, K. P. Murphy, and W. T. Freeman. Using the forest to see
the trees: exploiting context for visual object detection and localization. Commun. ACM, 53(3):107114, March 2010. 29
[TS01] Antonio Torralba and Pawan Sinha. Statistical context priming for
object detection. In Proceedings of the IEEE International Conference
on Computer Vision (ICCV), 2001. 29
[Ver06] David Vernon. The space of cognitive vision. In Cognitive Vision
Systems, pages 724. Springer, 2006. 22
[vGVSG10] J. C. van Gemert, C. J. Veenman, A. W. M. Smeulders, and J.-M. Geusebroek. Visual word ambiguity. IEEE_J_PAMI, 32(7):12711283, 2010.
34
115
BIBLIOGRAPHY
[Wan11] Fei-Yue Wang. A question for aaai: Does ai need a reboot? Intelligent
Systems, IEEE, 26(4):24, 2011. 3
[Wat] Watson. IBM http://www-03.ibm.com/innovation/us/watson/.
28
[WSS02] T. Watanabe, K. Sugawara, and H. Sugihara. A new pattern representation scheme using data compression. IEEE_J_PAMI, 24(5):579590,
2002. 17, 35
[ZKRR08] G. Zajic, N. Kojic, N. Reljin, and B. Reljin. Experiment with reduced
feature vector in cbir system with relevance feedback. IET Conference
Publications, 2008(CP543):176181, 2008. 56
116