4.6 Article

Examining linguistic shifts between preprints and publications

期刊

PLOS BIOLOGY
卷 20, 期 2, 页码 -

出版社

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pbio.3001470

关键词

-

资金

  1. Gordon Betty Moore Foundation [GBMF4552]
  2. National Institutes of Health's National Human Genome Research Institute (NHGRI) [R01 HG010067]
  3. National Institutes of Health's NHGRI [T32 HG00046]

向作者/读者索取更多资源

Preprints are a valuable tool for researchers to share their findings before they undergo peer review. This study examines the linguistic features of preprints in the bioRxiv repository and compares them to published biomedical text. The research reveals changes in typesetting and mentions of supporting information in preprints after peer review. Additionally, the study uses document embeddings to analyze scientific approaches, link preprints with peer-reviewed articles, and identify journals that publish similar papers. The findings show that preprints with more versions and textual changes take longer to publish. The study also introduces a web application that helps users identify linguistically similar journals and articles to preprints.
Preprints allow researchers to make their findings available to the scientific community before they have undergone peer review. Studies on preprints within bioRxiv have been largely focused on article metadata and how often these preprints are downloaded, cited, published, and discussed online. A missing element that has yet to be examined is the language contained within the bioRxiv preprint repository. We sought to compare and contrast linguistic features within bioRxiv preprints to published biomedical text as a whole as this is an excellent opportunity to examine how peer review changes these documents. The most prevalent features that changed appear to be associated with typesetting and mentions of supporting information sections or additional files. In addition to text comparison, we created document embeddings derived from a preprint-trained word2vec model. We found that these embeddings are able to parse out different scientific approaches and concepts, link unannotated preprint-peer-reviewed article pairs, and identify journals that publish linguistically similar papers to a given preprint. We also used these embeddings to examine factors associated with the time elapsed between the posting of a first preprint and the appearance of a peer-reviewed publication. We found that preprints with more versions posted and more textual changes took longer to publish. Lastly, we constructed a web application (https://greenelab.github.io/preprint-similarity-search/) that allows users to identify which journals and articles that are most linguistically similar to a bioRxiv or medRxiv preprint as well as observe where the preprint would be positioned within a published article landscape.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据