☆ 3.8 Proceedings Paper

A Word Embedding based Generalized Language Model for Information Retrieval

SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (2015)

Journal

SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL

Volume -, Issue -, Pages 795-798

Publisher

ASSOC COMPUTING MACHINERY

DOI: 10.1145/2766462.2767780

Keywords

Generalized Language model; Word Embedding; Word2Vec

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Word2vec, a word embedding technique, has gained significant interest among researchers in natural language processing (NLP) in recent years. The embedding of the word vectors helps to identify a list of words that are used in similar contexts with respect to a given word. In this paper, we focus on the use of word embeddings for enhancing retrieval effectiveness. In particular, we construct a generalized language model, where the mutual independence between a pair of words (say t and t') no longer holds. Instead, we make use of the vector embeddings of the words to derive the transformation probabilities between words. Specifically, the event of observing a term t in the query from a document d is modeled by two distinct events, that of generating a different term t', either from the document itself or from the collection, respectively, and then eventually transforming it to the observed query term t. The first event of generating an intermediate term from the document intends to capture how well a term fits contextually within a document, whereas the second one of generating it from the collection aims to address the vocabulary mismatch problem by taking into account other related terms in the collection. Our experiments, conducted on the standard TREC 6-8 ad hoc and Robust tasks, show that our proposed method yields significant improvements over language model (LM) and LDA-smoothed LM baselines.

A Word Embedding based Generalized Language Model for Information Retrieval

Journal

SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A Word Embedding based Generalized Language Model for Information Retrieval

Journal

SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper