☆ 4.5 Article

Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD

BEHAVIOR RESEARCH METHODS (2012)

期刊

BEHAVIOR RESEARCH METHODS

卷 44, 期 3, 页码 890-907

出版社

SPRINGER

DOI: 10.3758/s13428-011-0183-8

关键词

Semantic representation; Corpus statistics; SVD

类别

Psychology, Mathematical Psychology, Experimental

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In a previous article, we presented a systematic computational study of the extraction of semantic representations from the word-word co-occurrence statistics of large text corpora. The conclusion was that semantic vectors of pointwise mutual information values from very small co-occurrence windows, together with a cosine distance measure, consistently resulted in the best representations across a range of psychologically relevant semantic tasks. This article extends that study by investigating the use of three further factors-namely, the application of stop-lists, word stemming, and dimensionality reduction using singular value decomposition (SVD)-that have been used to provide improved performance elsewhere. It also introduces an additional semantic task and explores the advantages of using a much larger corpus. This leads to the discovery and analysis of improved SVD-based methods for generating semantic representations (that provide new state-of-theart performance on a standard TOEFL task) and the identification and discussion of problems and misleading results that can arise without a full systematic study.

Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD

期刊

BEHAVIOR RESEARCH METHODS

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD

期刊

BEHAVIOR RESEARCH METHODS

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文