Mixture of Soft Prompts for Controllable Data Generation

Chen, Derek; Lee, Celine; Lu, Yunan; Rosati, Domenic; Yu, Zhou

Computer Science > Computation and Language

arXiv:2303.01580 (cs)

[Submitted on 2 Mar 2023 (v1), last revised 18 Oct 2023 (this version, v2)]

Title:Mixture of Soft Prompts for Controllable Data Generation

Authors:Derek Chen, Celine Lee, Yunan Lu, Domenic Rosati, Zhou Yu

View PDF

Abstract:Large language models (LLMs) effectively generate fluent text when the target output follows natural language patterns. However, structured prediction tasks confine the output format to a limited ontology, causing even very large models to struggle since they were never trained with such restrictions in mind. The difficulty of using LLMs for direct prediction is exacerbated in few-shot learning scenarios, which commonly arise due to domain shift and resource limitations. We flip the problem on its head by leveraging the LLM as a tool for data augmentation rather than direct prediction. Our proposed Mixture of Soft Prompts (MSP) serves as a parameter-efficient procedure for generating data in a controlled manner. Denoising mechanisms are further applied to improve the quality of synthesized data. Automatic metrics show our method is capable of producing diverse and natural text, while preserving label semantics. Moreover, MSP achieves state-of-the-art results on three benchmarks when compared against strong baselines. Our method offers an alternate data-centric approach for applying LLMs to complex prediction tasks.

Comments:	19 pages, 13 Tables, 2 Figures. Accepted at EMNLP 2023
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2303.01580 [cs.CL]
	(or arXiv:2303.01580v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2303.01580

Submission history

From: Derek Chen [view email]
[v1] Thu, 2 Mar 2023 21:13:56 UTC (7,030 KB)
[v2] Wed, 18 Oct 2023 03:31:02 UTC (7,036 KB)

Computer Science > Computation and Language

Title:Mixture of Soft Prompts for Controllable Data Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Mixture of Soft Prompts for Controllable Data Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators