4.6 Article

Jointly Trained Conversion Model With LPCNet for Any-to-One Voice Conversion Using Speaker-Independent Linguistic Features

Related references

Note: Only part of the references are listed.
Proceedings Paper Acoustics

End-to-end LPCNet: A Neural Vocoder With Fully-Differentiable LPC Estimation

Krishna Subramani et al.

Summary: In this paper, a neural synthesis method called LPCNet is introduced to reduce the complexity of speech synthesis. By learning to infer LP coefficients from input features in the frame rate network, the proposed end-to-end approach eliminates the need for explicit LP analysis and exceeds the quality of the original LPCNet model. The open-source end-to-end model still benefits from LPCNet's low complexity while allowing for any type of conditioning features.

INTERSPEECH 2022 (2022)

Proceedings Paper Audiology & Speech-Language Pathology

Cross-lingual Voice Conversion with Disentangled Universal Linguistic Representations

Zhenchuan Yang et al.

Summary: This paper proposes an any-to-many voice conversion system based on disentangled universal linguistic representations (ULRs) which effectively improve the quality of the converted speech and can convert languages that have not been seen during training.

INTERSPEECH 2021 (2021)

Proceedings Paper Audiology & Speech-Language Pathology

Hi-Fi Multi-Speaker English TTS Dataset

Evelina Bakhturina et al.

Summary: This paper introduces a new multi-speaker English dataset for training text-to-speech models, based on LibriVox audiobooks and Project Gutenberg texts in the public domain. The dataset includes 292 hours of speech from 10 speakers, with at least 17 hours per speaker, sampled at 44.1 kHz. High-quality speech samples were selected based on a signal bandwidth of at least 13 kHz and a signal-to-noise ratio (SNR) of at least 32 dB. The dataset is publicly available at http://www.openslr.org/109/.

INTERSPEECH 2021 (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Towards Fine-Grained Prosody Control for Voice Conversion

Zheng Lian et al.

Summary: This study proposes using prosody embeddings to describe speech prosody, learned in an unsupervised manner from source speech, which can improve speech quality and speaker similarity of converted speech, even showing promising results in singing conditions.

2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) (2021)

Article Acoustics

Many-to-Many Voice Transformer Network

Hirokazu Kameoka et al.

Summary: This paper proposes a voice conversion method based on a S2S learning framework that can simultaneously convert voice characteristics among multiple speakers. By introducing the identity mapping loss for training, the model's performance at test time is significantly improved.

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2021)

Article Acoustics

An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning

Berrak Sisman et al.

Summary: Voice conversion is a technology that changes speaker identity while keeping linguistic content unchanged, involving various speech processing techniques. Recent advancements allow for producing human-like voice quality with high speaker similarity. This article provides an overview of voice conversion techniques, performance evaluation methods, and discusses their promise and limitations.

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2021)

Proceedings Paper Acoustics

PARALLEL WAVEGAN: A FAST WAVEFORM GENERATION MODEL BASED ON GENERATIVE ADVERSARIAL NETWORKS WITH MULTI-RESOLUTION SPECTROGRAM

Ryuichi Yamamoto et al.

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (2020)

Proceedings Paper Acoustics

GAUSSIAN LPCNET FOR MULTISAMPLE SPEECH SYNTHESIS

Vadim Popov et al.

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (2020)

Article Acoustics

Non-Parallel Sequence-to-Sequence Voice Conversion With Disentangled Linguistic and Speaker Representations

Jing-Xuan Zhang et al.

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2020)

Article Acoustics

Sequence-to-Sequence Acoustic Modeling for Voice Conversion

Jing-Xuan Zhang et al.

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2019)

Article Acoustics

An overview of voice conversion systems

Seyed Hamidreza Mohammadi et al.

SPEECH COMMUNICATION (2017)

Proceedings Paper Computer Science, Artificial Intelligence

Statistical voice conversion with WaveNet-based waveform generation

Kazuhiro Kobayashi et al.

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION (2017)

Proceedings Paper Computer Science, Artificial Intelligence

Voice Conversion from Unaligned Corpora using Variational Autoencoding Wasserstein Generative Adversarial Networks

Chin-Cheng Hsu et al.

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION (2017)

Article Acoustics

Voice Conversion Using Partial Least Squares Regression

Elina Helander et al.

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2010)

Article Acoustics

Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory

Tomoki Toda et al.

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2007)

Article Acoustics

Improving the intelligibility of dysarthric speech

Alexander B. Kain et al.

SPEECH COMMUNICATION (2007)