XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning

Ponti, Edoardo Maria; Glavaš, Goran; Majewska, Olga; Liu, Qianchu; Vulić, Ivan; Korhonen, Anna

Computer Science > Computation and Language

arXiv:2005.00333 (cs)

[Submitted on 1 May 2020 (v1), last revised 26 Oct 2020 (this version, v2)]

Title:XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning

Authors:Edoardo Maria Ponti, Goran Glavaš, Olga Majewska, Qianchu Liu, Ivan Vulić, Anna Korhonen

View PDF

Abstract:In order to simulate human language capacity, natural language processing systems must be able to reason about the dynamics of everyday situations, including their possible causes and effects. Moreover, they should be able to generalise the acquired world knowledge to new languages, modulo cultural differences. Advances in machine reasoning and cross-lingual transfer depend on the availability of challenging evaluation benchmarks. Motivated by both demands, we introduce Cross-lingual Choice of Plausible Alternatives (XCOPA), a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages, which includes resource-poor languages like Eastern Apurímac Quechua and Haitian Creole. We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods based on multilingual pretraining and zero-shot fine-tuning falls short compared to translation-based transfer. Finally, we propose strategies to adapt multilingual models to out-of-sample resource-lean languages where only a small corpus or a bilingual dictionary is available, and report substantial improvements over the random baseline. The XCOPA dataset is freely available at this http URL.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2005.00333 [cs.CL]
	(or arXiv:2005.00333v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2005.00333

Submission history

From: Edoardo Maria Ponti [view email]
[v1] Fri, 1 May 2020 12:22:33 UTC (165 KB)
[v2] Mon, 26 Oct 2020 23:23:58 UTC (170 KB)

Computer Science > Computation and Language

Title:XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators