Piper Voice Samples

Below are samples for Piper, a fast and local text to speech system. Samples were generated from the first paragraph of the Wikipedia entry for rainbow.

Quality

Voices are trained at one of 4 "quality" levels:

x_low - 16Khz audio, 5-7M params
low - 16Khz audio, 15-20M params
medium - 22.05Khz audio, 15-20M params
high - 22.05Khz audio, 28-32M params

Multi-Speaker

Some voices contain multiple speakers, which captures the style of multiple people within a single model.
Multi-speaker models can quickly switch between different speakers, but the quality of an individual speaker may be less than a single speaker model.