F0-Consistent Many-to-Many Non-Parallel Voice Conversion via Conditional Autoencoder - Audio Demo
Kaizhi Qian, Zeyu Jin, Mark Hasegawa-Johnson, Gautham Mysore
Paper
Our Paper is here.
Qualitative Evaluation
(Section 3.2 in the paper)
Our main goal is to compare F0-AutoVC against the original AutoVC, but we also include 2 additional baselines for better comparison.
- F0-AutoVC - the proposed F0-conditioned autoencoder-based conversion algorithm
- AutoVC - the original autoencoder-based conversion algorithm
- StarGAN-VC - a voice conversion system that adopts the StarGAN paradigm
- Chou et. al. - a voice conversion system combining autoencoder with GAN and speaker classifier
Below are a few demo audios.
Source Speaker / Speech | Target Speaker / Speech | Conversion | |
---|---|---|---|
p225 (Female) | p226 (Male) | F0-AutoVC | |
AutoVC | |||
StarGAN-VC | |||
Chou et. al. | |||
p225 (Female) | p270 (Male) | F0-AutoVC | |
AutoVC | |||
StarGAN-VC | |||
Chou et. al. | |||
p226 (Male) | p225 (Female) | F0-AutoVC | |
AutoVC | |||
StarGAN-VC | |||
Chou et. al. | |||
p227 (Male) | p233 (Female) | F0-AutoVC | |
AutoVC | |||
StarGAN-VC | |||
Chou et. al. |
F0-Control
(Section 3.1.3 in the paper)
We are able to control the converted F0 by modifying the conditioned F0.
For demonstration purpose, we simply modify the conditioned F0 to be a constant value.
Below are a few demo audios.