Word-Class Embeddings for Multiclass Text Classification

Moreo, Alejandro; Esuli, Andrea; Sebastiani, Fabrizio

Computer Science > Machine Learning

arXiv:1911.11506 (cs)

[Submitted on 26 Nov 2019]

Title:Word-Class Embeddings for Multiclass Text Classification

Authors:Alejandro Moreo, Andrea Esuli, Fabrizio Sebastiani

View PDF

Abstract:Pre-trained word embeddings encode general word semantics and lexical regularities of natural language, and have proven useful across many NLP tasks, including word sense disambiguation, machine translation, and sentiment analysis, to name a few. In supervised tasks such as multiclass text classification (the focus of this article) it seems appealing to enhance word representations with ad-hoc embeddings that encode task-specific information. We propose (supervised) word-class embeddings (WCEs), and show that, when concatenated to (unsupervised) pre-trained word embeddings, they substantially facilitate the training of deep-learning models in multiclass classification by topic. We show empirical evidence that WCEs yield a consistent improvement in multiclass classification accuracy, using four popular neural architectures and six widely used and publicly available datasets for multiclass text classification. Our code that implements WCEs is publicly available at this https URL

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
Cite as:	arXiv:1911.11506 [cs.LG]
	(or arXiv:1911.11506v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1911.11506
Journal reference:	Final version published in Data Mining and Knowledge Discovery 35(3), 911-963, 2021

Submission history

From: Alejandro Moreo Fernández [view email]
[v1] Tue, 26 Nov 2019 13:11:00 UTC (7,351 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2019-11

Change to browse by:

cs
cs.CL
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Alejandro Moreo
Andrea Esuli
Fabrizio Sebastiani

export BibTeX citation

Computer Science > Machine Learning

Title:Word-Class Embeddings for Multiclass Text Classification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Word-Class Embeddings for Multiclass Text Classification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators