Billions of Parameters Are Worth More Than In-domain Training Data: A case study in the Legal Case Entailment Task

Rosa, Guilherme Moraes; Bonifacio, Luiz; Jeronymo, Vitor; Abonizio, Hugo; Lotufo, Roberto; Nogueira, Rodrigo

Computer Science > Computation and Language

arXiv:2205.15172 (cs)

[Submitted on 30 May 2022]

Title:Billions of Parameters Are Worth More Than In-domain Training Data: A case study in the Legal Case Entailment Task

Authors:Guilherme Moraes Rosa, Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Roberto Lotufo, Rodrigo Nogueira

View PDF

Abstract:Recent work has shown that language models scaled to billions of parameters, such as GPT-3, perform remarkably well in zero-shot and few-shot scenarios. In this work, we experiment with zero-shot models in the legal case entailment task of the COLIEE 2022 competition. Our experiments show that scaling the number of parameters in a language model improves the F1 score of our previous zero-shot result by more than 6 points, suggesting that stronger zero-shot capability may be a characteristic of larger models, at least for this task. Our 3B-parameter zero-shot model outperforms all models, including ensembles, in the COLIEE 2021 test set and also achieves the best performance of a single model in the COLIEE 2022 competition, second only to the ensemble composed of the 3B model itself and a smaller version of the same model. Despite the challenges posed by large language models, mainly due to latency constraints in real-time applications, we provide a demonstration of our zero-shot monoT5-3b model being used in production as a search engine, including for legal documents. The code for our submission and the demo of our system are available at this https URL and this https URL, respectively.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2205.15172 [cs.CL]
	(or arXiv:2205.15172v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2205.15172

Submission history

From: Guilherme Moraes Rosa [view email]
[v1] Mon, 30 May 2022 15:21:26 UTC (138 KB)

Computer Science > Computation and Language

Title:Billions of Parameters Are Worth More Than In-domain Training Data: A case study in the Legal Case Entailment Task

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Billions of Parameters Are Worth More Than In-domain Training Data: A case study in the Legal Case Entailment Task

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators