4.5 Article

Making sense of kernel spaces in neural learning

Journal

COMPUTER SPEECH AND LANGUAGE
Volume 58, Issue -, Pages 51-75

Publisher

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD
DOI: 10.1016/j.csl.2019.03.006

Keywords

Kernel-based learning; Neural methods; Semantic spaces; Nystrom embeddings

Ask authors/readers for more resources

Kernel-based and Deep Learning methods are two of the most popular approaches in Computational Natural Language Learning. Although these models are rather different and characterized by distinct strong and weak aspects, they both had impressive impact on the accuracy of complex Natural Language Processing tasks. An advantage of kernel-based methods is their capability of exploiting structured information induced from examples. For instance, Sequence or Tree kernels operate over structures reflecting linguistic evidence, such as syntactic information encoded in syntactic parse trees. Deep Learning approaches are very effective as they can learn non-linear decision functions: however, general models require input instances to be explicitly modeled via vectors or tensors, and operating on structured data is made possible only by using ad-hoc architectures. In this work, we discuss a novel architecture that efficiently combines kernel methods and neural networks, in the attempt at squeezing the best from the two paradigms. The so-called Kernel-based Deep Architecture (KDA) adopts a Nystrom-based projection function to approximate any valid kernel function and convert any structure they operate on (for instance, linguistic structures, such as trees) into dense linear embeddings. These can be used as input of a Deep Feed-forward Neural Network that exploits such embeddings to learn non-linear classification functions. KDA is a mathematically justified integration of expressive kernel functions and deep neural architectures, with several advantages: it (i) directly operates over complex non-tensor structures, e.g., trees, without ad hoc manual feature engineering or architectural design, (ii) achieves a drastic reduction of the computational cost w.r.t. pure kernel methods, and (iii) exploits the non-linearity of Deep Architectures to produce accurate models. We experimented the KDA in three rather different semantic inference tasks: Semantic Parsing, Question Classification, and Community Question Answering. Results show that the KDA achieves state-of-the-art accuracy, with a computational cost that is much lower than the one necessary to train and test a pure kernel-based method, such as the SVM algorithm. (C) 2019 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available