4.7 Article

Scalability of knowledge distillation in incremental deep learning for fast object detection

期刊

APPLIED SOFT COMPUTING
卷 129, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.asoc.2022.109608

关键词

Visual recognition; Object detection; Deep neural network; Incremental learning; Knowledge transfer

资金

  1. Griffith University eResearch Services Team, Australia

向作者/读者索取更多资源

This paper investigates the scalability of incremental deep learning for visual recognition, specifically for fast object detection. The experimental results show that incremental learning with knowledge transfer and distillation can save storage requirements compared to training-at-once, but it increases computational time. Adjusting key parameters plays an important role in balancing the accuracy of new and old classes and reducing computational cost.
Visual recognition requires incremental learning to scale its underlying deep learning models with continuous data growth. The existing scalability challenge is maintaining the balance between effectiveness (accuracy) and efficiency (computational requirements) due to the rapidly increasing storage demand, computational time, and memory usage for processing both the old and new data. This paper aims to investigate the scalability of the incremental deep learning approach for visual recognition, specifically for fast object detection applications. The experimental study demonstrates knowledge retention and computational expense of training-at-once, compared to incremental learning with knowledge transfer and distillation. The experiment was based on a state-of-the-art object detector, which was extended to incorporate knowledge transfer and distillation to benchmark three training approaches, namely, training-at-once, transfer learning without, and with distillation. The experimental results and analysis examined the pros and cons of each training approach while adjusting some key parameters, focusing on comparing the accuracy of new classes, knowledge retention of old classes, data storage, computation time, and memory usage. Training-at-once (the baseline) yielded the highest accuracy of both new and old classes, at the expense of the largest storage and memory usage. Compared to the baseline, both transfer learning approaches saved the storage requirement by -73% but with an increased computation time of +53%. Transfer learning with distillation was important for knowledge retention, maintaining 96% accuracy with old classes, which indicates its ability to handle long-term incremental learning. Compared to using distillation, transfer learning without distillation was able to achieve slightly better accuracy with new classes (-53% compared to -60%), less memory usage (-65% compared to +26%), but at the expense of forgetting old classes (-100%). This study confirmed that distillation loss could help balance the accuracy of old and new object classes while maintaining all the benefits of incremental learning. The experiments using varied key parameters across all training approaches confirmed that training batch size and number of assigned classes play an important role in maintaining the accuracy of new classes, retaining the knowledge of old classes, and reducing the computational cost. (c) 2022 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据