3.8 Proceedings Paper

Towards an Optimal Solution to Lemmatization in Arabic

Journal

ARABIC COMPUTATIONAL LINGUISTICS
Volume 142, Issue -, Pages 132-140

Publisher

ELSEVIER SCIENCE BV
DOI: 10.1016/j.procs.2018.10.468

Keywords

lemmatization; Arabic; natural language processing; machine-learning-based lemmatization; dictionary-based lemmatization

Ask authors/readers for more resources

Lemmatization computing the canonical forms of words in running text is an important component in any NLP system and a key preprocessing step for most applications that rely on natural language understanding. In the case of Arabic, lemmatization is a complex task because of the rich morphology, agglutinative aspects, and lexical ambiguity due to the absence of short vowels in writing. In this paper, we introduce a new lemmatizer tool that combines a machine-learning-based approach with a lemmatization dictionary, the latter providing increased accuracy, robustness, and flexibility to the former. Our evaluations yield a performance of over 98% for the entire lemmatization pipeline. The lemmatizer tools are freely downloadable for private and research purposes. (C) 2018 The Authors. Published by Elsevier B.V.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available