Improving generalization in large language models by learning prefix subspaces

Falissard, Louis; Guigue, Vincent; Soulier, Laure

Computer Science > Machine Learning

arXiv:2310.15793 (cs)

[Submitted on 24 Oct 2023]

Title:Improving generalization in large language models by learning prefix subspaces

Authors:Louis Falissard, Vincent Guigue, Laure Soulier

View PDF

Abstract:This article focuses on large language models (LLMs) fine-tuning in the scarce data regime (also known as the "few-shot" learning setting). We propose a method to increase the generalization capabilities of LLMs based on neural network subspaces. This optimization method, recently introduced in computer vision, aims to improve model generalization by identifying wider local optima through the joint optimization of an entire simplex of models in parameter space. Its adaptation to massive, pretrained transformers, however, poses some challenges. First, their considerable number of parameters makes it difficult to train several models jointly, and second, their deterministic parameter initialization schemes make them unfit for the subspace method as originally proposed. We show in this paper that "Parameter Efficient Fine-Tuning" (PEFT) methods, however, are perfectly compatible with this original approach, and propose to learn entire simplex of continuous prefixes. We test our method on a variant of the GLUE benchmark adapted to the few-shot learning setting, and show that both our contributions jointly lead to a gain in average performances compared to sota methods. The implementation can be found at the following link: this https URL

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2310.15793 [cs.LG]
	(or arXiv:2310.15793v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2310.15793

Submission history

From: Louis Falissard [view email]
[v1] Tue, 24 Oct 2023 12:44:09 UTC (221 KB)

Computer Science > Machine Learning

Title:Improving generalization in large language models by learning prefix subspaces

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Improving generalization in large language models by learning prefix subspaces

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators