4.5 Article

An in-text citation classification predictive model for a scholarly search system

Journal

SCIENTOMETRICS
Volume 126, Issue 7, Pages 5509-5529

Publisher

SPRINGER
DOI: 10.1007/s11192-021-03986-z

Keywords

Citation classification; Machine learning; In-text citations; Scholarly search systems; Bibliometric-enhanced information retrieval

Funding

  1. Deanship of Scientific Research (DSR), King Abdulaziz University, Jeddah [RG-14-611-40]

Ask authors/readers for more resources

Citations in scholarly documents do not always have equivalent functions or importance. By using machine learning models and feature engineering, researchers were able to classify and predict the importance of citations in academic literature. The Random Forest model showed superior performance in predicting citation importance compared to other models.
We argue that citations in scholarly documents do not always perform equivalent functions or possess equal importance. To address this problem, we worked with a corpus of over 21 k citations from the Association for Computational Linguistics, from which 465 citations were randomly annotated by experts as either important or unimportant. We used an array of machine-learning models on these annotated citations: Random Forest (RF); Support Vector Machine (SVM); and Decision Tree (DT). For the classification task, the selected models employed 15 novel features: contextual; quantitative; and qualitative. We show that the RF model outperformed the comparative model by 9.52%, achieving a 92% precision-recall area under the curve. We present a prototype of a scientific publication search system based on the RF prediction model for feature engineering. This was used on a dataset of 4138 full-text articles indexed by PLOS ONE that consists of 31,839 unique references. The empirical evaluation shows that the proposed search system improves visibility of a given scientific document by including, along with its index terms, terms from the works that it cites that are predicted to be important. Overall, this yields improved search results against the queries by the user.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available