☆ 4.7 Article Proceedings Paper

Text summarization using a trainable summarizer and latent semantic analysis

INFORMATION PROCESSING & MANAGEMENT (2005)

Journal

INFORMATION PROCESSING & MANAGEMENT

Volume 41, Issue 1, Pages 75-95

Publisher

ELSEVIER SCI LTD

DOI: 10.1016/j.ipm.2004.04.003

Keywords

text summarization; corpus-based approach; latent semantic analysis; text relationship map

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

This paper proposes two approaches to address text summarization: modified corpus-based approach (MCBA) and LSA-based T.R.M. approach (LSA + T.R.M.). The first is a trainable summarizer, which takes into account several features, including position, positive keyword, negative keyword, centrality, and the resemblance to the title, to generate summaries. Two new ideas are exploited: (1) sentence positions are ranked to emphasize the significances of different sentence positions, and (2) the score function is trained by the genetic algorithm (GA) to obtain a suitable combination of feature weights. The second uses latent semantic analysis (LSA) to derive the semantic matrix of a document or a corpus and uses semantic sentence representation to construct a semantic text relationship map. We evaluate LSA + T.R.M. both with single documents and at the corpus level to investigate the competence of LSA in text summarization. The two novel approaches were measured at several compression rates on a data corpus composed of 100 political articles. When the compression rate was 30%, an average f-measure of 49% for MCBA, 52% for MCBA + GA, 44% and 40% for LSA + T.R.M. in single-document and corpus level were achieved respectively. (C) 2004 Elsevier Ltd. All rights reserved.

Text summarization using a trainable summarizer and latent semantic analysis

Journal

INFORMATION PROCESSING & MANAGEMENT

Publisher

ELSEVIER SCI LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Text summarization using a trainable summarizer and latent semantic analysis

Journal

INFORMATION PROCESSING & MANAGEMENT

Publisher

ELSEVIER SCI LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper