☆ 3.8 Proceedings Paper

Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION (2017)

Journal

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION

Volume -, Issue -, Pages 4011-4015

Publisher

ISCA-INT SPEECH COMMUNICATION ASSOC

DOI: 10.21437/Interspeech.2017-1798

Keywords

Speech synthesis; unit selection; hybrid; recurrent mixture density network; on-device

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

This paper describes Apple's hybrid unit selection speech synthesis system. which provides the voices for Siri with the requirement of naturalness, personality and expressivity. It has been deployed into hundreds of millions of desktop and mobile devices (e.g. iPhone, iPad, Mac, etc.) via iOS and macOS in multiple languages. The system is following the classical unit selection framework with the advantage of using deep learning techniques to boost the performance. In particular. deep and recurrent mixture density networks are used to predict the target and concatenation reference distributions for respective costs during unit selection. In this paper, we present an overview of the run-time TTS engine and the voice building process. We also describe various techniques that enable on-device capability such as preselection optimization, caching for low latency. and unit pruning for low footprint, as well as techniques that improve the naturalness and expressivity of the voice such as the use of long units.

Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System

Journal

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION

Publisher

ISCA-INT SPEECH COMMUNICATION ASSOC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System

Journal

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION

Publisher

ISCA-INT SPEECH COMMUNICATION ASSOC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper