☆ 4.5 Article

A bit of progress in language modeling

COMPUTER SPEECH AND LANGUAGE (2001)

Journal

COMPUTER SPEECH AND LANGUAGE

Volume 15, Issue 4, Pages 403-434

Publisher

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

DOI: 10.1006/csla.2001.0174

Keywords

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

In the past several years, a number of different language modeling improvements over simple trigram models have been found, including caching, higher-order n-grams, skipping, interpolated Kneser-Ney smoothing, and clustering. We present explorations of variations on, or of the limits of, each of these techniques, including showing that sentence mixture models may have more potential. While all of these techniques have been studied separately, they have rarely been studied in combination. We compare a combination of all techniques together to a Katz smoothed trigram model with no count cutoffs. We achieve perplexity reductions between 38 and 50% (1 bit of entropy), depending on training data size, as well as a word error rate reduction of 8.9%. Our perplexity reductions are perhaps the highest reported compared to a fair baseline. (C) 2001 Academic Press.

A bit of progress in language modeling

Journal

COMPUTER SPEECH AND LANGUAGE

Publisher

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A bit of progress in language modeling

Journal

COMPUTER SPEECH AND LANGUAGE

Publisher

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper