4.6 Article

Multiobjective Reinforcement Learning-Based Neural Architecture Search for Efficient Portrait Parsing

Journal

IEEE TRANSACTIONS ON CYBERNETICS
Volume 53, Issue 2, Pages 1158-1169

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCYB.2021.3104866

Keywords

Task analysis; Computer architecture; Computational modeling; Training; Labeling; Image segmentation; Faces; Face labeling; multiobjective; neural architecture search (NAS); portrait parsing; portrait segmentation; reinforcement learning (RL)

Ask authors/readers for more resources

This article introduces an automatic exploration method for efficient portrait parsing models, which can be easily deployed in edge computing or terminal devices. By using multiobjective reinforcement learning and neural architecture search, a balance between resource cost and performance is achieved, resulting in a series of excellent portrait parsing models.
This article dedicates to automatically explore efficient portrait parsing models that are easily deployed in edge computing or terminal devices. In the interest of the tradeoff between the resource cost and performance, we design the multiobjective reinforcement learning (RL)-based neural architecture search (NAS) scheme, which comprehensively balances the accuracy, parameters, FLOPs, and inference latency. Finally, under varying hyperparameter configurations, the search procedure emits a bunch of excellent objective-oriented architectures. The combination of two-stage training with precomputing and memory-resident feature maps effectively reduces the time consumption of the RL-based NAS method, so that we complete approximately 1000 search iterations in two GPU days. To accelerate the convergence of the lightweight candidate architecture, we incorporate knowledge distillation into the training of the search process. This also provides a reasonable evaluation signal to the RL controller that enables it to converge well. In the end, we conduct full training with outstanding Pareto-optimal architectures, so that a series of excellent portrait parsing models (with only approximately 0.3M parameters) is received. Furthermore, we directly transfer the architectures searched on CelebAMask-HQ (Portrait Parsing) to other portrait and face segmentation tasks. Finally, we achieve the state-of-the-art performance of 96.5% MIOU on EG1800 (portrait segmentation) and 91.6% overall F1-score on HELEN (face labeling). That is, our models significantly surpass the artificial network on the accuracy, but with lower resource consumption and higher real-time performance.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available