☆ 4.5 Article Proceedings Paper

Semantic hashing

INTERNATIONAL JOURNAL OF APPROXIMATE REASONING (2009)

期刊

INTERNATIONAL JOURNAL OF APPROXIMATE REASONING

卷 50, 期 7, 页码 969-978

出版社

ELSEVIER SCIENCE INC

DOI: 10.1016/j.ijar.2008.11.006

关键词

Information retrieval; Graphical models; Unsupervised learning

类别

Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

We show how to learn a deep graphical model of the word-count vectors obtained from a large set of documents. The values of the latent variables in the deepest layer are easy to infer and give a much better representation of each document than Latent Semantic Analysis. When the deepest layer is forced to use a small number of binary variables (e.g. 32), the graphical model performs semantic hashing: Documents are mapped to memory addresses in such a way that semantically similar documents are located at nearby addresses. Documents similar to a query document can then be found by simply accessing all the addresses that differ by only a few bits from the address of the query document. This way of extending the efficiency of hash-coding to approximate matching is Much faster than locality sensitive hashing, which is the fastest current method. By using semantic hashing to filter the documents given to TF-IDF, we achieve higher accuracy than applying TF-IDF to the entire document set. (C) 2008 Elsevier Inc. All rights reserved.

Semantic hashing

期刊

INTERNATIONAL JOURNAL OF APPROXIMATE REASONING

出版社

ELSEVIER SCIENCE INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Semantic hashing

期刊

INTERNATIONAL JOURNAL OF APPROXIMATE REASONING

出版社

ELSEVIER SCIENCE INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文