4.7 Article

Knowledge-enhanced document embeddings for text classification

Journal

KNOWLEDGE-BASED SYSTEMS
Volume 163, Issue -, Pages 955-971

Publisher

ELSEVIER SCIENCE BV
DOI: 10.1016/j.knosys.2018.10.026

Keywords

Semantic representation; Document embeddings; Text classification; Text mining

Funding

  1. Sao Paulo Research Foundation (FAPESP) [2013/14757-6, 2016/07620-2, 2016/17078-0]
  2. Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior - Brasil (CAPES) [001]
  3. Google PhD Fellowship in Natural Language Processing
  4. ERC [637277]
  5. ERC Starting Grant MultiJEDI [259234]
  6. ERC Consolidator Grant MOUSSE [726487]
  7. Fundacao de Amparo a Pesquisa do Estado de Sao Paulo (FAPESP) [16/07620-2] Funding Source: FAPESP
  8. European Research Council (ERC) [726487] Funding Source: European Research Council (ERC)

Ask authors/readers for more resources

Accurate semantic representation models are essential in text mining applications. For a successful application of the text mining process, the text representation adopted must keep the interesting patterns to be discovered. Although competitive results for automatic text classification may be achieved with traditional bag of words, such representation model cannot provide satisfactory classification performances on hard settings where richer text representations are required. In this paper, we present an approach to represent document collections based on embedded representations of words and word senses. We bring together the power of word sense disambiguation and the semantic richness of word and word-sense embedded vectors to construct embedded representations of document collections. Our approach results in semantically enhanced and low-dimensional representations. We overcome the lack of interpretability of embedded vectors, which is a drawback of this kind of representation, with the use of word sense embedded vectors. Moreover, the experimental evaluation indicates that the use of the proposed representations provides stable classifiers with strong quantitative results, especially in semantically-complex classification scenarios. (C) 2018 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available