Improved Sentiment Detection via Label Transfer from Monolingual to Synthetic Code-Switched Text

Samanta, Bidisha; Ganguly, Niloy; Chakrabarti, Soumen

Computer Science > Computation and Language

arXiv:1906.05725 (cs)

[Submitted on 13 Jun 2019]

Title:Improved Sentiment Detection via Label Transfer from Monolingual to Synthetic Code-Switched Text

Authors:Bidisha Samanta, Niloy Ganguly, Soumen Chakrabarti

View PDF

Abstract:Multilingual writers and speakers often alternate between two languages in a single discourse, a practice called "code-switching". Existing sentiment detection methods are usually trained on sentiment-labeled monolingual text. Manually labeled code-switched text, especially involving minority languages, is extremely rare. Consequently, the best monolingual methods perform relatively poorly on code-switched text. We present an effective technique for synthesizing labeled code-switched text from labeled monolingual text, which is more readily available. The idea is to replace carefully selected subtrees of constituency parses of sentences in the resource-rich language with suitable token spans selected from automatic translations to the resource-poor language. By augmenting scarce human-labeled code-switched text with plentiful synthetic code-switched text, we achieve significant improvements in sentiment labeling accuracy (1.5%, 5.11%, 7.20%) for three different language pairs (English-Hindi, English-Spanish and English-Bengali). We also get significant gains for hate speech detection: 4% improvement using only synthetic text and 6% if augmented with real text.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1906.05725 [cs.CL]
	(or arXiv:1906.05725v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1906.05725

Submission history

From: Bidisha Samanta [view email]
[v1] Thu, 13 Jun 2019 14:41:00 UTC (63 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2019-06

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Bidisha Samanta
Niloy Ganguly
Soumen Chakrabarti

export BibTeX citation

Computer Science > Computation and Language

Title:Improved Sentiment Detection via Label Transfer from Monolingual to Synthetic Code-Switched Text

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Improved Sentiment Detection via Label Transfer from Monolingual to Synthetic Code-Switched Text

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators