AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss - Audio Demo
Kaizhi Qian*, Yang Zhang*, Shiyu Chang, Xuesong Yang, Mark Hasegawa-Johnson
Code Traditional voice conversion Zero-shot voice conversionCode
Our code is released here.
Traditional Many-to-Many Conversion
(Section 5.2 in the paper)
Traditional many-to-many conversion performs voice conversion from and to speakers that are present in the training set. Four systems are implmented:
- AutoVC - the proposed autoencoder-based conversion algorithm
- AutoVC-one-hot - the proposed autoencoder-based conversion algorithm conditioned on one-hot speaker embeddings
- StarGAN-VC - a voice conversion system that adopts the StarGAN paradigm.
- Chou et. al. - a voice conversion system combining autoencoder with GAN and speaker classifier.
Below are a few demo audios.
Source Speaker / Speech | Target Speaker / Speech | Conversion | |
---|---|---|---|
p270 (Male) | p256 (Male) | AutoVC | |
AutoVC-one-hot | |||
StarGAN-VC | |||
Chou et. al. | |||
p228 (Female) | AutoVC | ||
AutoVC-one-hot | |||
StarGAN-VC | |||
Chou et. al. | |||
p225 (Female) | p256 (Male) | AutoVC | |
AutoVC-one-hot | |||
StarGAN-VC | |||
Chou et. al. | |||
p228 (Female) | AutoVC | ||
AutoVC-one-hot | |||
StarGAN-VC | |||
Chou et. al. |
Zero-Shot Voice Conversion
(Section 5.3 in the paper)
Zero-shot voice conversion performs conversion from and/or to speakers that are unseen during training, based on only 20 seconds of audio of the speakers. Only AutoVC is implemented for zero-shot voice conversion.
The following table shows conversions to seen speakers.
Target Speakers / Speech | |||
---|---|---|---|
P227 (Seen male) | P225 (Seen female) | ||
Source Speaker / Speech | P227 (Seen male) |
||
P225 (Seen female) |
|||
P252 (Unseen male) |
|||
P261 (Seen female) |
The following table shows conversions to unseen speakers.
Target Speakers / Speech | |||
---|---|---|---|
P252 (Uneen male) | P261 (Unseen female) | ||
Source Speaker / Speech | P227 (Seen male) |
||
P225 (Seen female) |
|||
P252 (Unseen male) |
|||
P261 (Seen female) |