1. Introduction
Over the last few years, collaborative tagging on the web has grown rapidly. Various collaborative platforms have emerged to allow members of a community to share their expertise. Collaborative tagging describes the process by which many users add metadata in the form of keywords to shared content. A set of categories commonly referred to as “folksonomies” [
1] is used to assign one or several labels (or tags) to some resource. This approach to organizing on-line information is usually contrasted with formal ontologies that are enforced by domain experts as opposed to common users [
2]. In collaborative tagging, users can assign to information or resources either uncontrolled keywords or controlled keywords originating from a pre-defined set. For example, controlled keywords can be used to assign a tag indicating an intensity or a level of confidence.
Collaborative tagging of websites now allows users to label a wide range of on-line documents (bookmarks, pictures, evaluation of touristic destinations cooking recipes) dedicated to various categories of knowledge. Members of collaborative platforms utilize tags to transfer their knowledge, in order to find a solution to a given problem, or a recommendation on how to solve a task.
There are both benefits and drawbacks to the tagging approach. Tagging is considered as a categorization process in contrast to a pre-optimized classification process, as exemplified by expert-created semantic web ontologies. Tagging systems allow more malleability and adaptability in organizing information than formal classification systems do. Because of this flexibility, reusing collaboratively affixed tags to train a classifier and re-applying these learned tags to other documents can be sometimes difficult. The lack of guidelines that characterizes the uncontrolled keywords induces significant variations across tag usages, since multiple users who collaborate have different background knowledge. Consequently, reusing tags and the contextual results of collaborative tagging systems as a training corpus, despite these variations, is a key challenge.
To remedy the shortcoming of the uniform evaluation of a collaborative resource by its contributors, we propose in this paper two different classification systems, trained on a collaboratively tagged corpus. The presented methods are intended to leverage the collective contribution of web users to build machine learning systems. We specifically try to contribute to the creation of a set of desirable properties of robust and effective tagging systems. We present a generic method for training classifiers using ambiguous tags, similar to the ones describing opinion, difficulty or quality evaluation. We use an application corpus composed of cooking recipes, with tags related to the opinion or culture of users. These tags are very different in nature from those usually found in categorization (e.g., Wikipedia category tags affixed by users) or description (e.g., descriptive tags affixed by users on pictures in Flickr or Del.icio.us services). We demonstrate that it is possible to reproduce collaborative annotations by carefully selecting an appropriate learning algorithm.
The paper is organized as follows:
Section 2 describes existing works related to tagging and the uniformity of classes in the context of collaboratively built corpora. In
Section 3, we detail the experimental context of this study. The collaborative corpus of cooking recipes we used is described in
Section 4.
Section 5 explains the experimental protocol, and defines the metrics associated with the evaluation campaign. We then present our systems in
Section 6 and
Section 7, where we report on experiments conducted with various classifiers, and we describe the systems we officially submitted to the DÉfi Fouille de Textes 2013 (DEFT 2013) evaluation campaign. The DEFT challenge is an annual French-speaking text mining evaluation challenge [
3]. These systems obtained the best overall results, ranking first on task 1 and second on task 2. They defined the state-of-the-art results for the 2013 campaign. Finally, the results we obtained are discussed in
Section 8.
2. Related Work
Text classification allows one to automatically organize a set of documents into one or more predefined categories [
4]. Each category associates a document with a meaningful semantic label. A typical example of text classification task is document filtering: given a set of folders, a system has to assign each document of the corpus to the proper folder. Text classification using machine-learning-based systems is mainly performed using supervised methods. Usually, a reference corpus is built by collecting text documents associated with various labels. Each of these labeled documents is used to generate a training corpus that will be fed to classification software. According to a given algorithm (for example, support vector machine, naive Bayes or a tree model), the classification software will produce a model that will predict classes for a non-labeled set of documents. Most often, the reference corpus will be built by a user or a group of annotators who will affix the class labels on each document. Each label is pre-defined in a taxonomy and affixed according to a set of rules.
Proponents of collaborative tagging often contrast tagging-based systems with taxonomies. Familiar models in classic taxonomy, applied to living things or objects, provide usually significantly unambiguous categories. For example, in geopolitical classification systems, classes for various levels of administrative regions are easy to associate with a description document (a town, a region or a country are relatively easy to differentiate). For such classification systems as the Wikipedia category system [
5], it is easy for a user to apply an accurate category tag and for a machine-learning system to reuse these tags for training a classification system dedicated to automatically reproduce the user’s tagging process [
6].
In collaborative tagging systems, since categories are defined by users, the human labeling process introduces subjectivity. For procedural documents, such as cooking recipes, knowledge and past experience of users are different; consequently, their choice for a given category might be different. In this context, the classification task differs from traditional tasks, where categories are defined by objective (therefore, more discriminant) criteria.
Other examples of procedural texts include user manuals and construction procedure. Much research has been done on processing procedural texts to extract domain knowledge [
7,
8]. Frequently, the classification of procedural texts associated with less discriminant criteria can be very similar to an opinion mining problem.
Recommender systems suggesting items of interest based on the user’s explicit and implicit preferences, preferences of other users and user’s and item attributes are studied in [
9]. To solve the problem of classification according to recommendation tags, authors have developed a method that combines content and collaborative data under a single probabilistic framework. They systematically explore three testing methodologies using a publicly available dataset. Another study from [
10] describes a tool for sifting through and synthesizing product reviews. The system uses structured reviews for training and testing, identifying appropriate features and scoring methods from information retrieval for determining whether reviews are positive or negative. This approach performs as well as traditional machine-learning methods. However, using machine-learning methods to identify and classify review sentences from the web makes classification harder to achieve. The authors conclude that with such data in this context, a simple technique for identifying the relevant attributes of a product produces better results.
In the DEFT2007 text mining evaluation campaign [
11], four French corpora collaboratively built were used in the context of classification challenges:
- –
a set of document related to movies and theater, where users defined their opinion in three classes of recommendation,
- –
another set related to the evaluation of video games with three classes,
- –
a peer review corpus, where scientific articles were evaluated with three classes of acceptance (from accept to reject),
- –
and a set of French members of Parliament interventions from a French assembly with the value of opinion expressed in these interventions, in two classes (vote for or against).
The results of this evaluation campaign showed how various methods and classifiers could be applied, and it achieved very similar results.
Making a distinction between opinion labeling and text classification tasks when class labels are subjectively affixed appears to be complex.
Indeed, it is difficult to clearly define a uniform terminology for this recent and very specific field of research, as explained in [
12]: “motivated by different real-world applications, researchers have considered a wide range of problems over a variety of different types of corpora”. For example, classifying news articles into good or bad news has been defined as a
sentiment classification task in the literature [
13].
Finally, the body of work and literature that deals with the computational treatment of opinion, sentiment and subjectivity related to text, notably in the particular case of the use of collaborative and non-taxonomic tags for document classification, is not very clear on the distinction between mining document opinion and text classification in such context.