4.5 Article

An automatic and association-based procedure for hierarchical publication subject categorization

期刊

JOURNAL OF INFORMETRICS
卷 18, 期 1, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.joi.2023.101466

关键词

Scientific publication subject categorization; Journal studies; Association rules

向作者/读者索取更多资源

Subject categorization of scientific publications is important for evaluating paper quality. Traditional mechanisms for categorization have been questioned, and a new method based on association rules is proposed. The method automatically defines publication categories based on the repetition or absence of relevant descriptors. The empirical study in the field of Physical Sciences and Engineering shows that the proposed method produces consistent and suitable categorization results.
Subject categorization of scientific publications, i.e., journals, book series or conference proceedings, has become a main concern in academia, as publication impact and ranking are considered a basic criterion to evaluate paper quality. Publishers usually propose their own categorization, but they often include only their own publications and their categories might not be coherent with other proposals. Also, due to the dynamic nature of science, new categories may frequently appear. As traditional mechanisms for categorization have been questioned by many authors, a new research line has emerged to improve the category assignment process. Approaches usually rely on assessing publication similarity in terms of topics, co-citation, editorial boards, and/or shared author profiles. In this work, we propose a novel procedure for scientific publication hierarchical categorization based on the repetition or absence of relevant descriptors in association rules among publications. The key idea is that publication categories can be automatically defined by strong associations of nuclear topics. Also, some very specific subcategories can be defined by exclusion from any set of rules. This process can be used to construct a data-driven hierarchy of scientific publication categories from scratch or to improve any existing categorization by discovering new fields. In this paper the proposed algorithm uses SJR descriptors all journals in the SCImago dataset and the three-level classification in the Scopus dataset (covering only 35 % of publications of the SCImago dataset) to discover new categories and assign every journal to the resulting enhanced hierarchy one. We have focused on the field of Physical Sciences and Engineering, using the SCImago and Scopus datasets from 2019 (30,883 scientific publications). Our procedure combines data engineering techniques with association rules and generates as a result potential new categories and outlier subcategories. To evaluate the suitability of our proposal, we have analyzed classification results based on the original category list and our extended two-level categorization via the Jensen-Shannon divergence and supervised machine-learning techniques. Results reveal the consistency and suitability of our categorization procedure.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据