期刊
ARABIC COMPUTATIONAL LINGUISTICS
卷 142, 期 -, 页码 132-140出版社
ELSEVIER SCIENCE BV
DOI: 10.1016/j.procs.2018.10.468
关键词
lemmatization; Arabic; natural language processing; machine-learning-based lemmatization; dictionary-based lemmatization
Lemmatization computing the canonical forms of words in running text is an important component in any NLP system and a key preprocessing step for most applications that rely on natural language understanding. In the case of Arabic, lemmatization is a complex task because of the rich morphology, agglutinative aspects, and lexical ambiguity due to the absence of short vowels in writing. In this paper, we introduce a new lemmatizer tool that combines a machine-learning-based approach with a lemmatization dictionary, the latter providing increased accuracy, robustness, and flexibility to the former. Our evaluations yield a performance of over 98% for the entire lemmatization pipeline. The lemmatizer tools are freely downloadable for private and research purposes. (C) 2018 The Authors. Published by Elsevier B.V.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据