☆ 4.1 Article

A new hybrid stemming method for Persian language

DIGITAL SCHOLARSHIP IN THE HUMANITIES (2017)

Journal

DIGITAL SCHOLARSHIP IN THE HUMANITIES

Volume 32, Issue 1, Pages 209-221

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/llc/fqv053

Keywords

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

One of the important issues in natural language processing and information retrieval is the automatic extraction of the word's stem. Both statistical and rule-based approaches for stemming have their own advantages and limitations. The statistical stemmers are not accurate and fail to take advantage of some language phenomenon which can be easily expressed by simple rules. On the other hand, handcrafting the stemming rules in the rule-based stemmers is a time-consuming, tedious, and impractical task. In this regard, we propose a new hybrid stemming method based on a combination of affix stripping and statistical techniques for Persian language. The proposed method combines cues from the orthography, word frequency, and syntactic distributions to induce the stemming rules. In general, the proposed method is divided into two main parts. In the first part, all words of the annotated text corpus are used to automatically induce the stemming rules; while in the second part, the rule-based stemmer uses the induced stemming rules to discover the word's stem. We test the performance of the proposed scheme on two different data sets. The encouraging results indicate the superior performance of the proposed method compared with its counterparts.

A new hybrid stemming method for Persian language

Journal

DIGITAL SCHOLARSHIP IN THE HUMANITIES

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A new hybrid stemming method for Persian language

Journal

DIGITAL SCHOLARSHIP IN THE HUMANITIES

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper