4.6 Article

TOP-Rank: A Novel Unsupervised Approach for Topic Prediction Using Keyphrase Extraction for Urdu Documents

Journal

IEEE ACCESS
Volume 8, Issue -, Pages 212675-212686

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2020.3039548

Keywords

Data mining; Clustering algorithms; Syntactics; Probabilistic logic; Prediction algorithms; Feature extraction; Computational modeling; Topic extraction; top-rank; keyphrase extraction; topic prediction; Urdu positional ranking

Ask authors/readers for more resources

In Natural Language Processing (NLP), topic modeling is the technique to extract abstract information from documents with huge amount of text. This abstract information leads towards the identification of the topics in the document. One way to retrieve topics from documents is keyphrase extraction. Keyphrases are a set of terms which represent high level description of a document. Different techniques of keyphrase extraction for topic prediction have been proposed for multiple languages i.e. English, Arabic, etc. However, this area needs to be explored for other languages e.g. Urdu. Therefore, in this paper, a novel unsupervised approach for topic prediction for Urdu language has been introduced which is able to extract more significant information from the documents. For this purpose, the proposed TOP-Rank system extracts keywords from the document and ranks them according to their position in a sentence. These keywords along with their ranking scores are utilized to generate keyphrases by applying syntactic rules to extracts more meaningful topics. These keyphrases are ranked according to the keywords scores and re-ranked with respect to their positions in the document. Finally, our proposed model identifies top-ranked keyphrases as topical significance and keyphrase with the highest score is selected as the topic of the document. Experiments are performed on two different datasets and performance of the proposed system is compared with existing state-of-the-art techniques. Results have shown that our proposed system outperforms existing techniques and holds the ability to produce more meaningful topics.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available