4.6 Article

Embeddings Evaluation Using a Novel Measure of Semantic Similarity

期刊

COGNITIVE COMPUTATION
卷 14, 期 2, 页码 749-763

出版社

SPRINGER
DOI: 10.1007/s12559-021-09987-7

关键词

-

向作者/读者索取更多资源

This paper proposes a novel approach called TaxoVec for selecting word embeddings based on their ability to preserve taxonomic similarity. Experimental results show that HSS outperforms state-of-the-art measures for measuring semantic similarity in taxonomy and the embeddings selected through TaxoVec achieve better performance on benchmark NLP tasks compared to competing measures.
Lexical taxonomies and distributional representations are largely used to support a wide range of NLP applications, including semantic similarity measurements. Recently, several scholars have proposed new approaches to combine those resources into unified representation preserving distributional and knowledge-based lexical features. In this paper, we propose and implement TaxoVec, a novel approach to selecting word embeddings based on their ability to preserve taxonomic similarity. In TaxoVec, we first compute the pairwise semantic similarity between taxonomic words through a new measure we previously developed, the Hierarchical Semantic Similarity (HSS), which we show outperforms previous measures on several benchmark tasks. Then, we train several embedding models on a text corpus and select the best model, that is, the model that maximizes the correlation between the HSS and the cosine similarity of the pair of words that are in both the taxonomy and the corpus. To evaluate TaxoVec, we repeat the embedding selection process using three other semantic similarity benchmark measures. We use the vectors of the four selected embeddings as machine learning model features to perform several NLP tasks. The performances of those tasks constitute an extrinsic evaluation of the criteria for the selection of the best embedding (i.e. the adopted semantic similarity measure). Experimental results show that (i) HSS outperforms state-of-the-art measures for measuring semantic similarity in taxonomy on a benchmark intrinsic evaluation and (ii) the embedding selected through TaxoVec achieves a clear victory against embeddings selected by the competing measures on benchmark NLP tasks. We implemented the HSS, together with other benchmark measures of semantic similarity, as a full-fledged Python package called TaxoSS, whose documentation is available at haps://pypi.org/project/TaxoSS.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据