Google Scholar

Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations

HF García, O Nieto, J Salamon, B Pardo… - arXiv preprint arXiv …, 2024 - arxiv.org

We present Sketch2Sound, a generative audio model capable of creating high-quality
sounds from a set of interpretable time-varying control signals: loudness, brightness, and
pitch, as well as text prompts. Sketch2Sound can synthesize arbitrary sounds from sonic
imitations (ie,~ a vocal imitation or a reference sound-shape). Sketch2Sound can be
implemented on top of any text-to-audio latent diffusion transformer (DiT), and requires only
40k steps of fine-tuning and a single linear layer per control, making it more lightweight than …

Save Cite Related articles View as HTML

Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations

H Flores García, O Nieto, J Salamon, B Pardo… - arXiv e …, 2024 - ui.adsabs.harvard.edu

Abstract We present Sketch2Sound, a generative audio model capable of creating high-
quality sounds from a set of interpretable time-varying control signals: loudness, brightness,
and pitch, as well as text prompts. Sketch2Sound can synthesize arbitrary sounds from sonic
imitations (ie,~ a vocal imitation or a reference sound-shape). Sketch2Sound can be
implemented on top of any text-to-audio latent diffusion transformer (DiT), and requires only
40k steps of fine-tuning and a single linear layer per control, making it more lightweight than …

Save Cite Related articles

Create alert

Cite

Advanced search

Saved to My library

Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations

Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations