F0-Consistent Many-to-Many Non-Parallel Voice Conversion via Conditional Autoencoder - Audio Demo

Kaizhi Qian, Zeyu Jin, Mark Hasegawa-Johnson, Gautham Mysore



Paper

Our Paper is here.

Qualitative Evaluation

(Section 3.2 in the paper)

Our main goal is to compare F0-AutoVC against the original AutoVC, but we also include 2 additional baselines for better comparison.

  • F0-AutoVC - the proposed F0-conditioned autoencoder-based conversion algorithm
  • AutoVC - the original autoencoder-based conversion algorithm
  • StarGAN-VC - a voice conversion system that adopts the StarGAN paradigm
  • Chou et. al. - a voice conversion system combining autoencoder with GAN and speaker classifier

Below are a few demo audios.

Source Speaker / Speech Target Speaker / Speech Conversion
p225 (Female) p226 (Male) F0-AutoVC
AutoVC
StarGAN-VC
Chou et. al.
p225 (Female) p270 (Male) F0-AutoVC
AutoVC
StarGAN-VC
Chou et. al.
p226 (Male) p225 (Female) F0-AutoVC
AutoVC
StarGAN-VC
Chou et. al.
p227 (Male) p233 (Female) F0-AutoVC
AutoVC
StarGAN-VC
Chou et. al.
Back to Top Back to Section Start



F0-Control

(Section 3.1.3 in the paper)

We are able to control the converted F0 by modifying the conditioned F0.
For demonstration purpose, we simply modify the conditioned F0 to be a constant value.

Below are a few demo audios.

Back to Top Back to Section Start