Align then Summarize: Automatic Alignment Methods for Summarization Corpus Creation

Tardy, Paul; Janiszek, David; Estève, Yannick; Nguyen, Vincent

Computer Science > Computation and Language

arXiv:2007.07841 (cs)

[Submitted on 15 Jul 2020]

Title:Align then Summarize: Automatic Alignment Methods for Summarization Corpus Creation

Authors:Paul Tardy, David Janiszek, Yannick Estève, Vincent Nguyen

View PDF

Abstract:Summarizing texts is not a straightforward task. Before even considering text summarization, one should determine what kind of summary is expected. How much should the information be compressed? Is it relevant to reformulate or should the summary stick to the original phrasing? State-of-the-art on automatic text summarization mostly revolves around news articles. We suggest that considering a wider variety of tasks would lead to an improvement in the field, in terms of generalization and robustness. We explore meeting summarization: generating reports from automatic transcriptions. Our work consists in segmenting and aligning transcriptions with respect to reports, to get a suitable dataset for neural summarization. Using a bootstrapping approach, we provide pre-alignments that are corrected by human annotators, making a validation set against which we evaluate automatic models. This consistently reduces annotators' efforts by providing iteratively better pre-alignment and maximizes the corpus size by using annotations from our automatic alignment models. Evaluation is conducted on \publicmeetings, a novel corpus of aligned public meetings. We report automatic alignment and summarization performances on this corpus and show that automatic alignment is relevant for data annotation since it leads to large improvement of almost +4 on all ROUGE scores on the summarization task.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2007.07841 [cs.CL]
	(or arXiv:2007.07841v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2007.07841
Journal reference:	LREC 2020 -- Proceedings of The 12th Language Resources and Evaluation Conference, 2020, pp. 6718--6724

Submission history

From: Paul Tardy [view email]
[v1] Wed, 15 Jul 2020 17:03:34 UTC (350 KB)

Computer Science > Computation and Language

Title:Align then Summarize: Automatic Alignment Methods for Summarization Corpus Creation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Align then Summarize: Automatic Alignment Methods for Summarization Corpus Creation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators