3.8 Proceedings Paper

PHONEMIC-LEVEL DURATION CONTROL USING ATTENTION ALIGNMENT FOR NATURAL SPEECH SYNTHESIS

Publisher

IEEE

Keywords

alignment loss; attention; deep learning; phonemic-level duration control; speech synthesis

Ask authors/readers for more resources

Recent attention-based end-to-end speech synthesis from text systems have achieved human-level performance. However, many approaches cause a sequence-to-sequence model to generate only averaged results of the input text, making it difficult to control the duration of utterance. In this study, we present a novel mechanism for phonemic-level duration control ( PDC) in a nearly end-to-end manner in order to solve this problem. We used a teacher attention alignment generated by an annotation speech analyzer program. Our method is inspired by the idea that the duration of a phoneme is highly related to its phonemic features. These phonemic features are saved on the attention alignment by adding duration embedding to it. This enables the model to learn and control the phonemic and rhythmic features of speech. We also show that providing alignment information as a teacher loss term improves training speed and notably, makes the model better at controlling the speed of dramatic change in phonemic-level duration with subjective demonstration. As a result, we show that our PDC speech synthesis with alignment loss outperforms other baseline methods without losing the ability to control the duration of phonemes in extremely adjusted environments with faster convergence.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available