3.8 Proceedings Paper

Designing a Better Data Representation for Deep Neural Networks and Text Classification

Publisher

IEEE
DOI: 10.1109/IRI.2016.61

Keywords

Deep Learning; Text Classification; Tweet Sentiment; Convolutional Neural Networks

Funding

  1. Direct For Computer & Info Scie & Enginr
  2. Division Of Computer and Network Systems [1427536] Funding Source: National Science Foundation

Ask authors/readers for more resources

Traditional machine learning requires data to be described by attributes prior to applying a learning algorithm. In text classification tasks, many feature engineering methodologies have been proposed to extract meaningful features; however, no best practice approach has emerged. Traditional methods of feature engineering have inherent limitations due to loss of information and the limits of human design. An alternative is to use deep learning to automatically learn features from raw text data. One promising deep learning approach is to use convolutional neural networks. These networks can learn abstract text concepts from character representations and be trained to perform discriminate tasks, such as classification. In this paper, we propose a new approach to encoding text for use with convolutional neural networks that greatly reduces memory requirements and training time for learning from character-level text representations. Additionally, this approach scales well with alphabet size allowing us to preserve more information from the original text, potentially enhancing classification performance. By training tweet sentiment classifiers, we demonstrate that our approach uses less computational resources, allows faster training for networks and achieves similar, or better performance compared to the previous method of character encoding.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available