☆ 3.8 Proceedings Paper

Purely sequence-trained neural networks for ASR based on lattice-free MMI

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES (2016)

Journal

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES

Volume -, Issue -, Pages 2751-2755

Publisher

ISCA-INT SPEECH COMMUNICATION ASSOC

DOI: 10.21437/Interspeech.2016-595

Keywords

neural networks; sequence discriminative training

Funding

NSF CRI Grant [1513128]
DARPA LORELEI Contract [HR0011-15-2-0024]
IARPA BABEL Contract [2012-12050800010]
Google
Direct For Computer & Info Scie & Enginr
Division Of Computer and Network Systems [1513128] Funding Source: National Science Foundation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

In this paper we describe a method to perform sequence discriminative training of neural network acoustic models without the need for frame-level cross-entropy pre-training. We use the lattice-free version of the maximum mutual information (MMI) criterion: LF-MMI. To make its computation feasible we use a phone n-gram language model, in place of the word language model. To further reduce its space and time complexity we compute the objective function using neural network outputs at one third the standard frame rate. These changes enable us to perform the computation for the forward-backward algorithm on GPUs. Further the reduced output frame-rate also provides a significant speed-up during decoding. We present results on 5 different LVCSR tasks with training data ranging from 100 to 2100 hours. Models trained with LF-MMI provide a relative word error rate reduction of similar to 11.5%, over those trained with cross-entropy objective function, and similar to 8%, over those trained with cross-entropy and sMBR objective functions. A further reduction similar to 2.5%of relative, can be obtained by fine tuning these models with the word-lattice based sMBR objective function.

Purely sequence-trained neural networks for ASR based on lattice-free MMI

Journal

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES

Publisher

ISCA-INT SPEECH COMMUNICATION ASSOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Purely sequence-trained neural networks for ASR based on lattice-free MMI

Journal

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES

Publisher

ISCA-INT SPEECH COMMUNICATION ASSOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper