PDF Semantics in Adaptive and Personalised Systems Methods Tools and Applications Pasquale Lops Ebook Full Chapter
PDF Semantics in Adaptive and Personalised Systems Methods Tools and Applications Pasquale Lops Ebook Full Chapter
PDF Semantics in Adaptive and Personalised Systems Methods Tools and Applications Pasquale Lops Ebook Full Chapter
https://textbookfull.com/product/crop-breeding-genetic-
improvement-methods-pasquale-tripodi/
https://textbookfull.com/product/production-management-advanced-
models-tools-and-applications-for-pull-systems-yacob-khojasteh/
https://textbookfull.com/product/complex-dynamical-systems-in-
education-concepts-methods-and-applications-1st-edition-matthijs-
koopmans/
https://textbookfull.com/product/artificial-adaptive-systems-
using-auto-contractive-maps-theory-applications-and-
extensions-1st-edition-paolo-massimo-buscema/
Multicomponent and Multiscale Systems Theory Methods
and Applications in Engineering 1st Edition Juergen
Geiser (Auth.)
https://textbookfull.com/product/multicomponent-and-multiscale-
systems-theory-methods-and-applications-in-engineering-1st-
edition-juergen-geiser-auth/
https://textbookfull.com/product/quantum-systems-in-chemistry-
and-physics-progress-in-methods-and-applications-1st-edition-
erkki-j-brandas-auth/
https://textbookfull.com/product/advances-in-hybridization-of-
intelligent-methods-models-systems-and-applications-1st-edition-
ioannis-hatzilygeroudis/
https://textbookfull.com/product/systems-modeling-methodologies-
and-tools-antonio-puliafito/
https://textbookfull.com/product/adaptive-resonance-theory-in-
social-media-data-clustering-roles-methodologies-and-
applications-lei-meng/
Pasquale Lops
Cataldo Musto
Fedelucio Narducci
Giovanni Semeraro
Semantics in
Adaptive and
Personalised
Systems
Methods, Tools and Applications
Semantics in Adaptive and Personalised Systems
Pasquale Lops Cataldo Musto
• •
Semantics in Adaptive
and Personalised Systems
Methods, Tools and Applications
123
Pasquale Lops Cataldo Musto
Dipartimento di Informatica Dipartimento di Informatica
Università di Bari Aldo Moro Università di Bari Aldo Moro
Bari, Italy Bari, Italy
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To my kids Giuseppe and Annapaola,
love of my life.
Pasquale Lops
Web search engines and recommender systems are among today’s most visible and
successful applications of Artificial Intelligence technology in practice. We rely on
such systems every day when we search for information, when we shop online, or
when we stream videos. Without such systems, it almost seems impossible to find
things that interest us within the huge amounts of information that are offered online
today.
Research in the area of Information Filtering and Information Retrieval dates
back to the 1970s or even earlier. One central task of such systems then and today is
to estimate to what extent a given document or web page is relevant for a given
query by the user. Although the final ranking of the relevant documents is often
influenced by other factors, e.g., the general popularity of a web page, any search
system at some stage makes inferences about what each indexed web page is about,
i.e., about its content.
Over the decades, these forms of reasoning have become more and more
sophisticated. On one hand, different internal document representations were
developed, from simple term-counting approaches, over latent semantic approaches
to embedding models, which implicitly encode semantic relationships between terms.
At the same time, more and more structured or unstructured external knowledge
sources have become available, e.g., in the form of Linked Data, which allow search
and information filtering systems to make inferences using explicitly given semantic
relations between the concepts that appear in queries and in documents.
The same set of techniques can also be applied in the field of recommender
systems, which is the main focus of this book. Here, the input to ranking task is not
an individual query, but a user profile that the system has learned from past user
interactions over time. Accordingly, such content-based or semantics-based rec-
ommenders are able to personalize the ranking on the bases of the assumed user
interests.
In the traditional categorization of recommendation techniques, content-based
methods (here, the term content also covers metadata and other side information)
are often considered as an alternative to collaborative filtering approaches. This
latter class of systems, which base their recommendations on behavioral patterns of
vii
viii Foreword
a larger user community, dominates the research landscape. However, pure col-
laborative approaches can have a number of limitations. It is, for example, difficult
to ensure that a set of recommendations is diverse when we do not know anything
about the similarity of two items. Likewise, explaining recommendations to users
can be a challenge when we cannot inform users how the attributes of a recom-
mended item relate to their preferences. In a number of application domains, it is
therefore favorable to design a hybrid system that combines knowledge about the
items with collaborative information.
The literature on semantics-based or hybrid recommender systems is actually
quite rich, but unfortunately also scattered. Today, relevant works appear in pub-
lication outlets of different communities, e.g., Information Retrieval, Semantic
Web, or Recommender Systems. This book therefore fills an existing gap in the
literature. It first provides an introduction to the basic concepts of content repre-
sentations, then discusses approaches for semantic analysis, and covers today’s
external knowledge sources that can be leveraged for information filtering and
recommendation. Based on these foundations, it then reviews use cases of how rich
content information can be used to build better recommenders for the future.
The human desire to make the machine always smarter has been the driving force
for all the research in the Artificial Intelligence (AI) area.
Generally speaking, what makes a system intelligent is the capability of
understanding signals coming from the environment and of correctly adapting its
behavior accordingly. Such a capability is strictly related to the definition and the
design of specific techniques for interpreting messages generated by the users.
Some years ago, when we typed on Google the query How tall is the Eiffel
Tower?, the system answered with a set of documents, some of them including the
information we were seeking for, but without a precise identification of the correct
answer. Today, this is no longer the case since intelligent assistants like Siri, Alexa,
or Google assistant, and the Google search engine itself, are able to provide the
exact answer the user is looking for, that is, in the case of the Eiffel Tower, 300 m.
Without any doubt, we can state that the semantics represents the theoretical
foundation to implement models and technologies that allow the machines to
interpret and understand information provided in natural language. Indeed, thanks
to the semantics, it is possible to give meaning to documents, sentences, and
questions expressed in natural language and to create a bridge between the infor-
mation needs of a user and the answers to those needs.
Such an intuition is currently implemented in several tools and platforms as
search engines, recommender systems, digital assistants, and contributes to their
tangible improvement in accuracy and effectiveness we are recently witnessing.
We hope this book could become a reference point in the panorama of adaptive
and personalized systems exploiting semantics. The book is organized into three
main parts. First, we motivate the need to exploit textual content in intelligent
information access systems, and then we give an overview of the basic method-
ologies to process and represent content-based features. Next, we thoroughly
describe state-of-the-art methodologies and techniques to enrich textual content
representation by introducing semantics. Finally, the last part of the book provides a
more practical perspective and discusses several applications that exploit the
techniques introduced and described in the previous chapters.
ix
x Preface
We would like to sincerely thank everyone who contributed to this book, and the
various people who provided us with comments and suggestions and encouraged us
to summarize years of work in a single book. We thank, in particular, Nancy
Wade-Jones from Springer, who supported us throughout the editorial process.
We are very grateful to the people of the Semantic Web Access and
Personalization—SWAP research group,1 who contributed to most of the work
cited and described in this book. We would like to thank Marco de Gemmis, who
started to investigate how Natural Language Processing techniques could be
adopted to devise a new generation of content-based recommender systems,
Pierpaolo Basile, who made available his great expertise related to Word Sense
Disambiguation and Distributional Semantics Models, which were successfully
used in complex recommendation environments, Annalina Caputo, a former
member of the research group working on semantic information retrieval methods.
We would also like to thank all the other collaborators, Ph.D. students, and research
fellows of the SWAP research group, in particular, Leo Iaquinta, Andrea Iovine,
Piero Molino, Marco Polignano, Gaetano Rossiello, Lucia Siciliani, and Vincenzo
Tamburrano, each giving a specific contribution to the ideas, systems, and research
presented in this book.
1
http://www.di.uniba.it/%7Eswap/.
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Data Explosion and Information Overload . . . . . . . . . . . . . . . . . . 2
1.2 Intelligent Information Access . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Information Retrieval and Information Filtering . . . . . . . . . 5
1.2.2 Recommender Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Why Do We Need Content? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.1 Tackling the Issues of Collaborative Filtering . . . . . . . . . . 11
1.3.2 Feed and Follow Recent Trends . . . . . . . . . . . . . . . . . . . . 15
1.4 Why Do We Need Semantics? . . . . . . . . . . . . . . . . . . . . . . . . . . 16
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2 Basics of Content Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1 Pipeline for Text Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.1 Lexical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.2 Syntactic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Vector Space Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3 Semantics-Aware Content Representation . . . . . . . . . . . . . . . . . . . 40
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3 Encoding Endogenous Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.1 Distributional Semantics Models . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Word Embedding Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2.1 Latent Semantic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2.2 Random Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2.3 Word2Vec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3 Explicit Semantic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
xi
xii Contents
AI Artificial Intelligence
BNC British National Corpus
BOW Bag of Words
CBOW Continuous Bag of Words
CBRS Content-Based Recommender System
CF Collaborative Filtering
CL-ESA Cross-Language Explicit Semantic Analysis
CoRS Conversational Recommender System
DM Dialog Manager
DSM Distributional Semantics Model
EL Entity Linking
EPG Electronic Program Guides
ER Entity Recognition
ESA Explicit Semantic Analysis
GDPR General Data Protection Regulation
IDF Inverse Document Frequency
IF Information Filtering
IMDB Internet Movie Database
IR Information Retrieval
KB Knowledge Base
LDA Latent Dirichlet Allocation
LOD Linked Open Data
LSA Latent Semantic Analysis
LSI Latent Semantic Indexing
MSS Most Specific Subsumer
MUC Message Understanding Conference
NER Named Entity Recognition
NLP Natural Language Processing
NMF Nonnegative Matrix Factorization
NNDB Notable Names Database
xiii
xiv Acronyms
NP Noun Phrases
OWL Ontology Web Language
PCA Principal Component Analysis
pLSA Probabilistic Latent Semantic Analysis
PMI Pointwise Mutual Information
POS Part of Speech
QA Question Answering
RDF Resource Description Framework
RI Random Indexing
RP Random Projection
RS Recommender Systems
SA Sentiment Analyzer
SG Skip-Gram
SPARQL SPARQL Protocol and RDF Query Language
SUN Social Urban Network
SVD Singular Value Decomposition
TF Term Frequency
TR-ESA Translation-based Explicit Semantic Analysis
URI Uniform Resource Identifier
VP Verb Phrases
VSM Vector Space Model
WSD Word Sense Disambiguation
List of Figures
xv
xvi List of Figures
Adaptive and personalized systems play an increasingly important role in our daily
lives, since we more and more rely on systems that tailor their behavior on the ground
of our preferences and needs, and support us in a broad range of heterogeneous
decision-making tasks.
As an example, we are used to exploit Spotify to get the best music playlist for our
workout, we ask Netflix to suggest us a movie to watch in a rainy night at home, and
we use Amazon to get recommendations about items to buy. Even complex tasks,
such as identifying the best location for our summer holidays or tailoring financial
investments to our needs and plans, are now often tackled through personalized and
adaptive systems.
The rise of such a user-centric vision made technologies such as personalized
search engines, recommender systems, and intelligent personal assistants very popu-
lar and essential. However, these technologies could never have existed in the absence
of the main fuel that feeds them: the data. These platforms tremendously need data
to carry out a broad set of tasks, ranging from modeling users’ needs and preferences
to training complex machine learning models to make inferences and predictions.
In the absence of the data, most of the intelligent systems we discuss in this book
would not have become so popular.
Accordingly, it is straightforward to imagine that the recent growth of such tech-
nologies goes hand in hand with the recent growth of online (personal) data that
are spread through social networks, collaborative platforms, and personal devices.
The more the available data about a person, the more effective the personalization
and adaptation process can be. In turn, such a scenario fueled two different research
trends. On one side, such a unique availability of data, typically referred to as Data
Explosion [31], emphasizes the so-called problem of Information Overload and en-
courages the development and the design of systems able to support the users in
sifting through this huge flow of data, such as Information Retrieval and Information
Filtering systems. On the other side, all the personal data that are now available on
the Web and on social networks (what we like, who are our friends, which places
© Springer Nature Switzerland AG 2019 1
P. Lops et al., Semantics in Adaptive and Personalised Systems,
https://doi.org/10.1007/978-3-030-05618-6_1
2 1 Introduction
we often visit, etc.) also fostered the research in the area of user modeling, since
the acquisition and the processing of these data contributes to the definition of very
precise representation of the person, which in turn enables accurate personalization
and adaptation mechanisms.
In this chapter, we present and discuss both the aspects, since they represent the
main motivations that led us to the writing of this book.
First, we discuss how we can effectively cope with the surplus of information
by developing technologies for intelligent information access. Next, we deepen the
discussion and we show how the available data can be used to feed intelligent in-
formation access systems by providing a very precise and fine-grained modeling of
users’ interests and needs. Specifically, we will pay particular attention to investigate
and discuss the importance of content-based and textual information sources in such
a scenario.
To sum up, this chapter is organized as follows: First, we introduce the concepts
of Data Explosion and Information Overload. Next, we focus our attention on the
available strategies to effectively tackle this issue, as the exploitation of Information
Retrieval and Information Filtering methodologies to develop tools for intelligent
information access. Finally, we discuss the role of data in such a scenario, by em-
phasizing the importance of gathering and modeling content-based information and
by showing that the injection of semantics in content representation can lead to even
more precise user models and more effective personalization algorithms.
The concept of Data Explosion (or Information Explosion) was recently introduced
to refer to the growth of the information spread through the Web and through the
Internet of Things. A primary cause of this uncontrolled increase of the available data
is represented by the recent rise of collaborative platforms and social networks [13]
as Wikipedia, YouTube, Facebook, Instagram, and so on, which made the authoring
of content easier and easier.
As shown by several studies, Web users are making the most of this opportunity,
since a tremendous amount of information is produced and generated through these
platforms1 : As an example, 481,000 Tweets and 46,000 posts are published every
minute on Twitter and Instagram, respectively. Similarly, 973,000 people have access
every minute to Facebook to produce information by posting materials or by leaving
comments on public pages. Messaging services as WhatsApp are involved in this
scenario as well, since 38 million messages are sent through the app every minute.
Such systems upset very stable Web dynamics, since they replaced the original
dichotomy between producers and consumers of information, typical of the “old”
1 https://www.visualcapitalist.com/internet-minute-2018/.
1.1 Data Explosion and Information Overload 3
Web, with a new and more “democratic” vision where each user can act at the same
time as both producer and consumer of information.
This phenomenon, already prophesied by Alvin Toffler in the early 80s,2 has two
main consequences: on one side, it gives great opportunities to the users, since content
can be authored and published more easily than 5 or 10 years ago. On the other, it is
also unfortunately leading to the diffusion of such an amount of information which
is objectively unbearable and with no control in terms of quality and reliability of
produced content. As stated by recent analyses,3 every day 2.5 quintillion bytes of
data are produced, and the pace is further accelerating with the growth of the Internet
of Things.
Two main questions arise from this scenario:
1. Can we effectively deal with such a huge amount of information?
2. Is there any opportunity resulting from this surplus of (personal) data?
The answer to the first question is equally simple and straightforward: no, we
can’t, and some data immediately confirm this intuitive idea.
Given that 300 hours of videos are uploaded on YouTube every minute,4 it would
take around 800 years of nonstop watching to watch each and every video uploaded
in the last year. Moreover, the spread of mobile devices makes even more difficult
to follow the flow of information (there are currently 1.3 billion gigabytes of data
traffic on mobile networks only5 ), despite we spend about 22% of our navigation
time on social networks, as shown by an analysis carried out by Nielsen.6
The inability of dealing with all the available online information is also confirmed
by the studies carried on by Adrian Ott. As shown in [24], our brain has a physiological
limit since it can process 126 bits of information per second. Unfortunately, the
amount of information we have to deal with in our daily navigation on the Web is
equal to 393 bits per second, and thus the information we should process every day
is three times the amount of information we can process in an effective way.7 Such
a state of things is typically referred to as Information Overload.
Even if the amount of the available information significantly grew in the last few
years, the concept of Information Overload has older origins and it is not even strictly
related to the Web. Indeed, this term was first mentioned in 1852 by the Secretary
of the Smithsonian Institute in Washington. Later, during the 1948 Royal Society’s
Influential Scientific Information Conference, the Information Overload began to be
labeled as a “problem” [3]. In literature [37], Alvin Toffler used this term to describe
2 Alvin Toffler first proposed the portmanteau “prosumers” in the book “Third Wave” in 1980.
3 https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-
day-the-mind-blowing-stats-everyone-should-read.
4 https://merchdope.com/youtube-stats/.
5 http://www.mobithinking.com/mobile-marketing-tools/latest-mobile-stats.
6 http://blog.nielsen.com/nielsenwire/social/.
7 The study was carried out in 2010. It is likely that the amount of information today available would
a prophetic scenario where the rapid technological growth of the society (he called
it as super-industrial society) caused stress and confusion in the individuals.
As established by some research [38], this dystopian scenario is close to come
true, since several works showed that the Information Overload is a problem that can
decrease both productivity and quality of life of the individuals, by leading in the
worse case to attention deficits, anxiety, cyberchondria, and so on [11].
Currently, there is not a single and universally accepted definition of this issue. In
general, Information Overload tends to describe as a state of things where efficiency
is jeopardized by the amount of available information [8]. More precisely, humans are
placed at overload when the information is presented at a rate too fast to be processed
by a single person [33]. Nowadays, we can experience the Information Overload in
several daily activities we perform on the Web: scrolling the huge number of search
results returned by a search engine, browsing a large set of items in a catalogue, or
just filtering a news feed to drop out things we are not interested in.
Unfortunately, as we previously stated, data are quickly spread through the Web
at a pace that is even going to increase. This is an irreversible process we needs to
dominate, in order to create some opportunities from the huge flow of data people
have to face every day.
A possible direction to effectively tackle the problem of Information Overload
is proposed by Shirky [34], who emphasized that the abundance of data does not
represent a problem by itself. According to Shirky, the main issue regards the absence
of appropriate filters that support the physiological deficits of our brain and help us
in selecting the most important pieces of information among the available ones.
In other terms, humans have to develop effective strategies to filter the information
in a proper way, rather than simply reducing or avoiding the production of data. The
opinion spread by Shirky fostered a huge research effort since the uncontrolled
growth of information—beyond being considered as a problem—also triggers a lot
of opportunities for researchers and practitioners aiming at dominate the flow of
information through the development of new and better filters.
Indeed, most of the data nowadays available are first of all personal data, since they
someway regard the person who produced it. What we write, the places where we have
been, the persons we follow, our emotions. All these signals provide heterogeneous
and important information about our preferences and our needs that a personalized
filter needs to take into account.
As we will show throughout this book, the development of proper filters is a very
effective way to tackle the problem of Data Explosion and Information Overload.
This is confirmed by several success stories in the area of search engines as Google,
or in the area of personalized systems as Amazon, Netflix, and YouTube.
Generally speaking, the development of such systems has two requirements: first,
a precise and fine-grained description of all the aspects that describe the target user,
who is supposed to exploit the filter (typically referred to as user model); and second,
a precise description of what the filter should do. Accordingly, one of the goals of
this book is to provide an overview of the most effective methodologies to address
both the requirements and to develop very effective intelligent information access
systems.
1.2 Intelligent Information Access 5
Information Retrieval (IR) concerns the finding of relevant information from a col-
lection of data (usually unstructured text) [30]. Search engines, such as Google and
Bing, are typical examples of IR applications. A formal characterization of an IR
model is given by Baeza-Yates and Ribeiro-Neto [1]. Generally speaking, the goal
of IR systems is to tackle the problem of information overload by driving the user to
those documents that will satisfy her own information needs. User information needs
are usually represented by means of a query, expressed in a language understood by
the system.
In the typical workflow of an IR system, the query is submitted to the search
engine, whose goal is to understand the meaning of user request and to identify the
most relevant pieces of information among those that are stored in the collection.
Next, a ranking function orders the documents in a descending relevance criterion
and the top entries are finally returned to the user. An example of such a workflow
is provided in Fig. 1.1.
In 2009, Google enhanced the original paradigm of IR by introducing8 some tips to
take into account also personal and contextual information in their search algorithms.
8 http://googleblog.blogspot.com/2009/12/personalized-search-for-everyone.html.
6 1 Introduction
As an example, by storing users’ clicks in previous searches, the algorithm can better
identify users’ preferred topics, and this information can be used to better rank search
results by moving up the pages the user is more likely to be interested in. In this way,
the plethora of personal information about the users can be exploited to improve the
ranking of the returned results.
However, even if this choice started the evolution of the classical “search”
paradigm toward its personalized and contextual variants, the current paradigm still
requests an explicit query that expresses and describes the informative needs of the
user. As stated by Ingwersen and Willett [14], this is a very challenging and problem-
atic task, since the users have to model their needs by relying on a limited vocabulary
only based on keywords, which is too far from the one they typically use.
To effectively tackle this issue, alternative methodologies for Intelligent Infor-
mation Access emerged. As an example, Information Filtering (IF) techniques were
introduced to expose users with the information they want without the need of an
explicit query that triggers the whole process [12].
Even if both IR and IF have the goal of optimizing the access to unstructured
information sources, they have a strict methodological difference: First, IF does not
exploit an explicit query of the user, but it relies on some filtering criterion that
triggers the whole process. Moreover, IF systems are not designed to find relevant
information pieces, but rather to filter out the noise from a generic information
flow according to such a criterion. An example of the workflow carried out by an
Information Filtering tool is reported in Fig. 1.2. Such a strict relationship between
the areas was already discussed by Belkin and Croft [4], who defined IF and IR as
two sides of the same coin.
As already pointed out by O’Brien, the development of these systems is a first step
to shift the paradigm of classical search toward discovery,9 that is to say, a scenario
where the information is automatically pushed to the users instead of being pulled
according to an explicit query.
According to Malone et al. [22], the approaches to Information Filtering can
be classified into three different categories, according to the filtering criterion they
implement:
• Cognitive filtering: It is based on content analysis;
• Social filtering: It is based on individual judgments of quality, communicated
through personal relationships;
• Economic filtering: It is based on estimated search cost and benefits.
Typically, cognitive filtering is carried out by simply analyzing the content as-
sociated to each informative item. Whether it contains (or, alternatively, it does not
contain) specific features, it is filtered out. A typical scenario where this filtering
approach is applied is spam detection in e-mail clients: If an e-mail contains specific
terms, it is labeled as “spam” and the mail is filtered out. Similar methodologies are
also implemented to identify relevant articles or posts from a complete news feed:
The more the overlap between the keywords describing an article (e.g., sports, poli-
tics, etc.) and those appearing in the articles the target user was previously interested
in, the more the likelihood that the user would be interested in reading that news.
Conversely, social filtering complements the cognitive approach by focusing on
the characteristics of the users. As an example, some features describing the user
or some explicit relationships (e.g., an e-mail message received from the supervisor
has to be considered as relevant) can be used as a signal to filter the information
flow. Similarly, users’ behavior can be analyzed to bring out similarity or patterns
that can be exploited to forecast their future behavior (e.g., whether all the users in
my group—the so-called neighborhood—liked a specific song or a specific movie,
it might be relevant for me, too).
Finally, the economic filtering approach relies on various types of cost–benefit
assessments and explicit or implicit pricing mechanisms. A simple cost-versus-value
heuristic is the length of an e-mail message.
The set of techniques that can be exploited to filter the information flow is very wide,
ranging from simple heuristics to complex statistical models and machine learning
methodologies. When a filtering system also takes into account as filtering criterion
some information about a specific user (namely, a profile), it is common to refer to
it as personalized system [23].
9 http://archive.fortune.com/magazines/fortune/fortune_archive/2006/11/27/8394347/index.htm.
8 1 Introduction
10 In most of the scenarios that will be discussed in this book documents and items can be considered
as synonyms, since we will always describe the items by providing them with some descriptive
features. However, it is necessary to state that this is not a constraint, and RS can also work without
exploiting content.
1.2 Intelligent Information Access 9
engine,11 and many companies frequently report claims that RS contribute from
10% to 30% of total revenues [10].
Regardless of the specific methodology adopted to generate the recommendations,
a RS basically carries out the following three steps:
1. Training: First, the system needs to acquire information about a target user
(what she knows, what she likes, the task to be accomplished, demographical or
contextual information, and so on). This step could be accomplished in an explicit
or implicit way. In the first case, the user explicitly expresses her preferences (by
means of a numeric scale, for example) on randomly chosen items, while in the
latter user preferences are gathered by analyzing her transactional or behavioral
data (for example, clicking a link or reading a news article could be considered
as a clue of user interest in that item).
2. User Modeling: In general, the concept of personalization implies the presence
of something describing and identifying the user that interacts with the system.
So, the information extracted is usually modeled and stored in a user profile.
Modeling the user profile is a central step of the pipeline since it is the compo-
nent that triggers the whole recommendation process. The choices about which
information has to be stored and the way the user profile is built, updated, and
maintained are generally strictly related to the specific filtering model imple-
mented by the system.
3. Filtering and Recommendation: Finally, the information flow is filtered out by
exploiting data stored in the user profile. The goal of this step is to rank the items
according to a relevance criterion and to provide user with a list of the most
relevant items in order to let her express her own feedbacks on the proposed
ones. Formally, at the end of the filtering step, the system returns a subset of
items ranked in a descending relevance order.
As reported in [6], recommender system techniques can be classified on the ground
of the different approaches they adopt for the user modeling step as well as for the
filtering and recommendation one.
1. Content-Based Recommender Systems (CBRS): This class of RS suggests
items similar to those preferred in the past by the user.
2. Collaborative Recommender Systems: This class of RS suggests items pre-
ferred by users with similar needs or preferences.
3. Demographic Recommender Systems: This class of RS suggests items on the
ground of the demographic profile of the user.
4. Knowledge-Based Recommender Systems: This class of RS suggests items
whose features meet user needs and preferences according to specific domain
knowledge.
5. Community-Based Recommender Systems: This type of system recommends
items based on the preferences of user’s friends.
11 http://www.mckinsey.com/industries/retail/our-insights/how-retailers-can-keep-up-with-
consumers.
10 1 Introduction
As previously explained, this book introduces and discusses several strategies to ef-
fectively exploit content-based information and textual data in intelligent information
access platforms.
The first question that may arise in such a context is simple: Why do we need
content? Why is it so important to handle and process textual information to develop
effective filters and provide users with intelligent information access?
As for some specific tools, the need for textual information is quite trivial. As
an example, search engines cannot simply work in the absence of content-based
information. As previously shown in Fig. 1.1, the typical pipeline implemented in
search engines relies on a query and a set of textual documents: In the absence of
content, no keyword will describe the available content so every query that will be
run will return no results. As a consequence, a proper design of a search engine
cannot leave out of consideration a proper modeling of content-based and textual
information.
12 Incollaborative filtering methodologies, two users sharing similar preferences are labeled as
neighbors.
12 1 Introduction
Fig. 1.4 Toy example of a data model for a collaborative recommender system, based on the
user–item matrix
(the second column). Once the list of the neighbors has been built,13 collaborative
recommendations are generated by looking for items the neighbors already liked
and the target user did not yet enjoyed. In this case, the recommendation would have
been Cloud Atlas (the fourth column of the matrix) since u 1 , who shared the same
preferences with the target user, liked that movie.
As shown in this example, collaborative filtering can provide suggestions by
only relying on user ratings (or user preferences, generally speaking), and thus the
usefulness of exploiting content-based data in this scenario is not that straightforward.
Moreover, some work [26] further emphasized these aspects by showing that in some
particular scenario even a few ratings are more valuable than item metadata, and thus
the usage of textual information can be even counterproductive.
However, collaborative filtering algorithms are not free from problems. One of the
main issues that affect collaborative recommendation algorithms is typically referred
to as “sparsity”. These algorithms suffer from sparsity when the number of available
ratings is very tiny and most of the boxes in the user–item matrix are empty. Why
does sparsity is a relevant problem?
In the worst case, sparsity can make the recommendation algorithm unable to
generate suggestions. This is a scenario that always happens when collaborative
filtering algorithms are run. In particular, when a new algorithm is deployed, the
user–item matrix is completely empty since no interaction has been stored in the
data model.14 In this case, no recommendation can be returned.
Similarly, when just a few ratings are available, it is not easy to calculate the neigh-
borhood of the target user. An example of such a problem is reported in Fig. 1.5. Who
13 In a real collaborative filtering scenario, a neighbor typically consists of tens or hundreds of users.
14 This problem is typically referred to as cold start.
Another random document with
no related content on Scribd:
noted below. It is far easier to fill the cylinder when it is disassembled
from the cradle. If assembled in the cradle, bring the gun to its
maximum elevation and remove both filling and drain plugs. It is
necessary that the drain plug holes should be lubricated on top of
the cylinder. Fill through the hole in the piston rod. Allow a few
minutes for the air to escape and the oil to settle.
Refill and repeat two or three times. When satisfied that the
cylinder is entirely full of oil, insert both plugs, and depress the gun
to its maximum depression. After a few moments elevate again to its
maximum elevation and unscrew both plugs. Now refill as described
above. When entirely full, allow not more than two cubic inches
(about one-fourth of a gill) of the oil to escape, insert both plugs and
lash them with copper wire. It may happen that after firing a few
rounds the gun will not return to battery. This may be due to, first,
weakness of springs, second, stuffing box gland being screwed up
too tight, or third, the oil having expanded, due to heat. It any case
the cause must be ascertained and remedied, if due to expansion of
oil, it is proven by the fact that the gun cannot be pushed into battery
by force exerted on the breech of the gun. In that case elevate the
gun to its maximum elevation and remove the filling plug. The oil will
now escape permitting the gun to return to battery. In emergencies,
water may be used in the cylinder. This should be done only when
absolutely necessary, and never in freezing weather, and as soon as
practicable the cylinder should be emptied, cleaned, and thoroughly
dried and filled with hydroline oil. About 9 pints of hydroline oil are
required for filling the recoil cylinder.
To empty the recoil cylinder.—The cylinder may be emptied
either when assembled or disassembled from the cradle. In either
case, remove both the filling and drain plugs, depress the forward
end of the cylinder and drain the contents into a clean can or other
receptacle over which a piece of linen or muslin has been stretched,
for straining the oil.
To clean the recoil cylinder oil.—The hydroline oil used in the
cylinder should be cleaned and free from grit and dirt. The oil should
be stored in the closed cans provided for the purpose, and be
carefully protected from dirt, sand, or water. Oil withdrawn from the
cylinders and containing any sediment must not be used again until it
has been allowed to settle for not less than 24 hours. When
sediment has thus been permitted to settle great care must be taken
not to disturb it in removing the oil. To insure the cleanliness of all
cylinder oil it should be strained through a clean piece of linen or
muslin before using.
To clean the bore of the gun.—After firing and at other times
when necessary, the bore of the gun should be cleaned to remove
the residue of smokeless powder, and then oiled. In cleaning, wash
the bore with a solution made by dissolving one-half pound of Sal
Soda in one gallon of boiling water. After washing with the soda
solution, wipe perfectly dry and then oil the bore with a thin coating
of the light slushing oil furnished for that purpose. Briefly stated, the
care of the bore consists of removing the fouling resulting from firing,
in obtaining a chemically clean surface and in coating this surface
with a film of oil to prevent rusting. The fouling which results from
firing of two kinds—one, the production of combustion of powder, the
other, copper scraped off the rotating band. Powder fouling because
of its acid reaction, is highly corrosive, that is, it will induce rust and
must be removed. Metal fouling of itself is unactive, but its presence
prevents the action of cleaning agents. It should be removed if it
accumulates. At every opportunity in the intermission of fire, the bore
of the gun should be cleaned and lubricated.
To clean the breech mechanism.—The breech mechanism
should be kept clean and well lubricated. It should be dismounted for
examination and oiled when assembled.
To clean the recoil springs.—Dismount to clean. All rust should
be removed and the springs well oiled before assembling. When the
springs are dismounted the interior of the cradle should be cleaned
and examined for defective riveting, missing rivet heads and scoring.
The condition of the spring support guide should be noted and all
burrs or scores carefully smoothed off.
To clean, lubricate and care for the elevating and traversing
mechanism.—The contact surfaces between the cradle and the
rocker should be kept clean, thoroughly oiled, and free from rust. If
indications of rusting, cutting, or scoring of these surfaces appear,
the cradle should be dismounted, the rust removed, and rough spots
smoothed away. The elevating and traversing mechanisms should
be dismounted for thorough cleaning and overhauling. They should
be kept well oiled and should work easily. If at any time either
mechanism works harder than usual, it should be immediately
overhauled and the cause discovered and removed. In traveling, the
cradle should be locked to the trail by means of the elevating and
traversing lock, so as to relieve the pointing mechanism of all travel
stresses.
To clean, lubricate and care for the wheels.—The wheel and
wheel fastenings should be dismounted periodically and the
fastenings, hub boxes, axle arms, and axle bore cleaned and
examined. All roughness due to scoring or cutting should be
smoothed off. The hollow part of the axle acts as a reservoir for the
oil to lubricate the wheel bearings. Experience will show how much
oil is needed, but enough should be used to insure that the oil will
pass through the axle arms to the hub caps. The nuts on the hub
bolts should be tightened monthly during the first year of service and
twice a year thereafter. The ends of the bolts should be lightly riveted
over to prevent the nut from unscrewing. When the hub bolts are
tightened, the hub band should be screwed up as tightly as possible
against the lock washer at the outer end of the hub ring.
Care of Materiel.
(a) Parts of the Carriages.
All nuts are secured by split pins, which should be replaced and
properly opened when nuts are screwed home. Do not strike any
metal part directly with a hammer; interpose a buffer of wood or
copper. All working and bearing surfaces of the carriage require
oiling; those not directly accessible for this purpose are provided with
oil holes closed by spring covers or handy oilers. Do not permit
brake levers to be released with a kick or blow. It has been found
that the apron hinges occasionally become broken, and that the
apron hinge pins are frequently lost. Whenever this happens the
hinge or hinge-pins should be immediately replaced. For if this is not
done the apron, which is very expensive is apt to become cracked or
broken. When the lunette becomes loosened the lunette nuts should
at once be tightened.
(b) Wheels.
Keep hub bolts and hub bands properly tightened. To tighten the
hub bands screw them as tightly as possible with a wrench and then
force them farther by striking the end of the wrench with a hammer.
All wheels and pintle bearings should be frequently oiled.
(c) Inspections.
Battery commander should frequently make a detailed inspection
of all the vehicles in the battery, to see if any parts of them are
broken or if any screws, nuts, split-pins, et cetera are missing. If any
such defects are found they should immediately take steps to
replace missing or broken parts. At these inspections the material
should also be examined to ascertain whether the cleaning
schedules have been properly carried out. Compliance with these
instructions will do much toward prolonging the life of the carriage.
Care of Metal.
All metal equipment should be kept clean and free from rust. Coal
oil is used to remove rust, but it must always be removed as it will
rust the metal if allowed to remain. The coal oil should be applied to
the metal and if possible allowed to remain for a short time. This will
loosen and partially dissolve the rust so that it can be rubbed off with
a rag or a sponge. Continued applications may be necessary if there
is much rust. A solution of Sal Soda is also a good rust remover. The
articles must be washed thoroughly after using this to remove all
traces of the soda as it is a very active corrosive. Never scour metals
to remove rust if it can be avoided as this leaves a roughened
surface which will rust again much more easily. Polished surfaces
such as brass fittings should be cleaned and polished with Lavaline.
This may also be used on the bearing surfaces of steel collars. All
surfaces after cleaning should be dried thoroughly and if not painted
should be greased with cosmis or cosmoline. These form an air-
proof coating over the metal surface so that no moisture may reach it
and cause rusting. If the metal is not dried thoroughly, some
moisture may be held between the grease and the metal surface
which will in time cause rust to appear. Care must be taken that the
grease covers the surface completely. All surfaces against which
there is no friction should be painted and kept so. Ordinary olive drab
or collar paint is very satisfactory for this purpose.
Daily.
Weekly.
Monthly.
SIGHTS.
The instruments provided for sighting and laying the gun include a
line sight, a rear sight, a front sight, a panoramic sight, and a range
quadrant.
Line sights.—The line sight consists of a conical point as a front
sight and a V notch as a rear sight, located on the top element of the
gun. They determine a line of sight parallel to the axis of the bore,
useful in giving general direction to the gun.
Front and rear sights.—The front and rear sights are for general
use in direct aiming. The front sight carries cross wires. The rear
sight is of the peep variety, constructed as follows: To the sight
bracket is attached the shank socket upon which a spirit level is
mounted for the necessary correction due to difference in level of
wheels. The sight shank consists of a steel arc, the center of which
is the front sight. It slides up and down in the shank socket and is
operated by a scroll gear. A range strip is attached to the face of the
shank and is graduated up to 6500 yards, least reading 50 yards. To
the left side of the shank is an elevation spirit level, permitting
approximate quadrant elevations to be given with the sight shank
when the quadrant is out of order.
The peep sight and its deflection scale are mounted above the
shank. This peep traverses along a screw operated by a knurled
head. A socket and ratchet are also provided for the attachment of
the panoramic sight.
Rear Sight.
Description.
Level rocker and set scales for zero setting as directed in first
paragraph under “direct fire.”
Lay off required deflection in azimuth by means of micrometer
index and azimuth worm knob, so that deflection may be read from
azimuth index and azimuth micrometer. Traverse gun until vertical
cross hair of panoramic sight is on aiming point.
Vertical angles may be read by means of elevation scale and
micrometer scale. Zero point of elevation scale is 3. Each division on
elevation scale represents 100 mils.
All scales are graduated in mils.
The open sight on side of rotating head is used to obtain
preliminary direction of sight.
In turning azimuth angles greater than 100 mils the throw-out lever
may be pressed and rotating head turned to nearest division in even
hundreds desired. Each unit on azimuth scale represents 100 mils.
The principal parts of the panoramic sight are the rotating head,
the elevation device and its micrometer, the azimuth mechanism with
limb and micrometer, the rotating prism mechanism, the deflection
mechanism, R and L scale and micrometer, the shank and the
eyepiece.
The limb or azimuth scale is divided into 64 parts, each division
representing 100 mils.
The azimuth micrometer is divided in 100 equal divisions or mils,
numbered every 5 mils. One complete revolution of the azimuth
micrometer is equal to the distance between divisions on the azimuth
scale. The limb of the deflection scale is divided into six divisions;
three on each side of the zero, red for right and black for left, each
division representing 100 mils. The deflection micrometer, engraved
upon the front end, is graduated into 100 equal divisions, numbered
every 10 mils, red and black in opposite directions.