☆ 4.3 Article

BenLem (A Bengali Lemmatizer) and Its Role in WSD

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING (2016)

Journal

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING

Volume 15, Issue 3, Pages -

Publisher

ASSOC COMPUTING MACHINERY

DOI: 10.1145/2835494

Keywords

Bengali; evaluation; Indic languages; lemmatizer; word sense disambiguation (WSD)

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

A lemmatization algorithm for Bengali has been developed and evaluated. Its effectiveness for word sense disambiguation (WSD) is also investigated. One of the key challenges for computer processing of highly inflected languages is to deal with the frequent morphological variations of the root words appearing in the text. Therefore, a lemmatizer is essential for developing natural language processing (NLP) tools for such languages. In this experiment, Bengali, which is the national language of Bangladesh and the second most popular language in the Indian subcontinent, has been taken as a reference. In order to design the Bengali lemmatizer (named as BenLem), possible transformations through which surface words are formed from lemmas are studied so that appropriate reverse transformations can be applied on a surface word to get the corresponding lemma back. BenLem is found to be capable of handling both inflectional and derivational morphology in Bengali. It is evaluated on a set of 18 news articles taken from the FIRE Bengali News Corpus consisting of 3,342 surface words (excluding proper nouns) and found to be 81.95% accurate. The role of the lemmatizer is then investigated for Bengali WSD. Ten highly polysemous Bengali words are considered for sense disambiguation. The FIRE corpus and a collection of Tagore's short stories are considered for creating the WSD dataset. Different WSD systems are considered for this experiment, and it is noticed that BenLem improves the performance of all the WSD systems and the improvements are statistically significant.

BenLem (A Bengali Lemmatizer) and Its Role in WSD

Journal

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

BenLem (A Bengali Lemmatizer) and Its Role in WSD

Journal

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper