4.5 Article

HyperLex:: lexical cartography for information retrieval

Journal

COMPUTER SPEECH AND LANGUAGE
Volume 18, Issue 3, Pages 223-252

Publisher

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD
DOI: 10.1016/j.csl.2004.05.002

Keywords

-

Ask authors/readers for more resources

This article describes an algorithm called HyperLex that is capable of automatically determining word uses in a textbase without recourse to a dictionary. The algorithm makes use of the specific properties of word cooccurrence graphs, which are shown as having small world properties. Unlike earlier dictionary-free methods based on word vectors, it can isolate highly infrequent uses (as rare as 1% of all occurrences) by detecting hubs and high-density components in the cooccurrence graphs. The algorithm is applied here to information retrieval on the Web, using a set of highly ambiguous test words. An evaluation of the algorithm showed that it only omitted a very small number of relevant uses. In addition, HyperLex offers automatic tagging of word uses in context with excellent precision (97%, compared to 73% for baseline tagging, with an 82% recall rate). Remarkably good precision (96%) was also achieved on a selection of the 25 most relevant pages for each use (including highly infrequent ones). Finally, HyperLex is combined with a graphic display technique that allows the user to navigate visually through the lexicon and explore the various domains detected for each word use. (C) 2004 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available