3.8 Proceedings Paper

Towards an Optimal Solution to Lemmatization in Arabic

期刊

ARABIC COMPUTATIONAL LINGUISTICS
卷 142, 期 -, 页码 132-140

出版社

ELSEVIER SCIENCE BV
DOI: 10.1016/j.procs.2018.10.468

关键词

lemmatization; Arabic; natural language processing; machine-learning-based lemmatization; dictionary-based lemmatization

向作者/读者索取更多资源

Lemmatization computing the canonical forms of words in running text is an important component in any NLP system and a key preprocessing step for most applications that rely on natural language understanding. In the case of Arabic, lemmatization is a complex task because of the rich morphology, agglutinative aspects, and lexical ambiguity due to the absence of short vowels in writing. In this paper, we introduce a new lemmatizer tool that combines a machine-learning-based approach with a lemmatization dictionary, the latter providing increased accuracy, robustness, and flexibility to the former. Our evaluations yield a performance of over 98% for the entire lemmatization pipeline. The lemmatizer tools are freely downloadable for private and research purposes. (C) 2018 The Authors. Published by Elsevier B.V.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据