Cross-Lingual Training for Automatic Question Generation

Kumar, Vishwajeet; Joshi, Nitish; Mukherjee, Arijit; Ramakrishnan, Ganesh; Jyothi, Preethi

Computer Science > Computation and Language

arXiv:1906.02525 (cs)

[Submitted on 6 Jun 2019]

Title:Cross-Lingual Training for Automatic Question Generation

Authors:Vishwajeet Kumar, Nitish Joshi, Arijit Mukherjee, Ganesh Ramakrishnan, Preethi Jyothi

View PDF

Abstract:Automatic question generation (QG) is a challenging problem in natural language understanding. QG systems are typically built assuming access to a large number of training instances where each instance is a question and its corresponding answer. For a new language, such training instances are hard to obtain making the QG problem even more challenging. Using this as our motivation, we study the reuse of an available large QG dataset in a secondary language (e.g. English) to learn a QG model for a primary language (e.g. Hindi) of interest. For the primary language, we assume access to a large amount of monolingual text but only a small QG dataset. We propose a cross-lingual QG model which uses the following training regime: (i) Unsupervised pretraining of language models in both primary and secondary languages and (ii) joint supervised training for QG in both languages. We demonstrate the efficacy of our proposed approach using two different primary languages, Hindi and Chinese. We also create and release a new question answering dataset for Hindi consisting of 6555 sentences.

Comments:	ACL 2019
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1906.02525 [cs.CL]
	(or arXiv:1906.02525v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1906.02525

Submission history

From: Vishwajeet Kumar [view email]
[v1] Thu, 6 Jun 2019 11:31:24 UTC (491 KB)

Computer Science > Computation and Language

Title:Cross-Lingual Training for Automatic Question Generation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Cross-Lingual Training for Automatic Question Generation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators