4.1 Article

A new hybrid stemming method for Persian language

Journal

DIGITAL SCHOLARSHIP IN THE HUMANITIES
Volume 32, Issue 1, Pages 209-221

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/llc/fqv053

Keywords

-

Ask authors/readers for more resources

One of the important issues in natural language processing and information retrieval is the automatic extraction of the word's stem. Both statistical and rule-based approaches for stemming have their own advantages and limitations. The statistical stemmers are not accurate and fail to take advantage of some language phenomenon which can be easily expressed by simple rules. On the other hand, handcrafting the stemming rules in the rule-based stemmers is a time-consuming, tedious, and impractical task. In this regard, we propose a new hybrid stemming method based on a combination of affix stripping and statistical techniques for Persian language. The proposed method combines cues from the orthography, word frequency, and syntactic distributions to induce the stemming rules. In general, the proposed method is divided into two main parts. In the first part, all words of the annotated text corpus are used to automatically induce the stemming rules; while in the second part, the rule-based stemmer uses the induced stemming rules to discover the word's stem. We test the performance of the proposed scheme on two different data sets. The encouraging results indicate the superior performance of the proposed method compared with its counterparts.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.1
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available