Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches

Naseem, Tahira; Snyder, Benjamin; Eisenstein, Jacob; Barzilay, Regina

doi:10.1613/jair.2843

Computer Science > Computation and Language

arXiv:1401.5695 (cs)

[Submitted on 15 Jan 2014]

Title:Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches

Authors:Tahira Naseem, Benjamin Snyder, Jacob Eisenstein, Regina Barzilay

View PDF

Abstract:We demonstrate the effectiveness of multilingual learning for unsupervised part-of-speech tagging. The central assumption of our work is that by combining cues from multiple languages, the structure of each becomes more apparent. We consider two ways of applying this intuition to the problem of unsupervised part-of-speech tagging: a model that directly merges tag structures for a pair of languages into a single sequence and a second model which instead incorporates multilingual context using latent variables. Both approaches are formulated as hierarchical Bayesian models, using Markov Chain Monte Carlo sampling techniques for inference. Our results demonstrate that by incorporating multilingual evidence we can achieve impressive performance gains across a range of scenarios. We also found that performance improves steadily as the number of available languages increases.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1401.5695 [cs.CL]
	(or arXiv:1401.5695v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1401.5695
Journal reference:	Journal Of Artificial Intelligence Research, Volume 36, pages 341-385, 2009
Related DOI:	https://doi.org/10.1613/jair.2843

Submission history

From: Tahira Naseem [view email] [via jair.org as proxy]
[v1] Wed, 15 Jan 2014 05:39:01 UTC (1,372 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2014-01

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Tahira Naseem
Benjamin Snyder
Jacob Eisenstein
Regina Barzilay

export BibTeX citation

Computer Science > Computation and Language

Title:Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators