3.8 Proceedings Paper

The Perception and Analysis of the Likeability and Human Likeness of Synthesized Speech

Publisher

ISCA-INT SPEECH COMMUNICATION ASSOC
DOI: 10.21437/Interspeech.2018-1093

Keywords

synthesized voices; human likeness; likeability

Funding

  1. Bavarian State Ministry of Education, Science and the Arts
  2. European Union's Seventh Framework and Horizon 2020 Programmes [338164]

Ask authors/readers for more resources

The synthesized voice has become an ever present aspect of daily life. Heard through our smart-devices and from public announcements, engineers continue in an endeavour to achieve naturalness in such voices. Yet, the degree to which these methods can produce likeable, human like voices, has not been fully evaluated. With recent advancements in synthetic speech technology suggesting that human like imitation is more obtainable, this study asked 25 listeners to evaluate both the likeability and human likeness of a corpus of 13 German male voices, produced via 5 synthesis approaches (from formant to hybrid unit selection, deep neural network systems), and 1 Human control. Results show that unlike visual artificially intelligent elements as posed by the concept of the Uncanny Valley likeability consistently improves along with human likeness for the synthesized voice, with recent methods achieving substantially closer results to human speech than older methods. A small scale acoustic analysis shows that the F0 of hybrid systems correlates less closely to human speech with a higher standard deviation for F0. This analysis suggests that limited variance in F0 is linked to a reduction in human likeness, resulting in lower likeability for conventional synthetic speech methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available