期刊
PATTERN RECOGNITION
卷 124, 期 -, 页码 -出版社
ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2021.108505
关键词
Mask encoding; Instance segmentation; Video instance segmentation
资金
- National Natural Science Foundation of China [62073244]
- Shanghai Innovation Action Plan [20511100500, 20511105802]
- Innovation Program of Shanghai Municipal Education Commission [202101070007E00098]
Instance segmentation is a challenging task in computer vision that requires separating each instance at the pixel level. Current dominant representation for instance masks is a low-resolution binary mask. This work proposes an effective approach to encode high-resolution structured masks into a compact representation that combines high quality and low complexity. The proposed method can be easily integrated into existing pipelines and improves the mask average precision (AP) on various datasets.
Instance segmentation is one of the most challenging tasks in computer vision, which requires separating each instance in pixels. To date, a low-resolution binary mask is the dominant paradigm for representation of instance mask. For example, the size of the predicted mask in Mask R-CNN is usually 28 x 28 . Generally, a low-resolution mask can not capture the object details well, while a high-resolution mask dramatically increases the training complexity. In this work, we propose a flexible and effective approach to encode the high-resolution structured mask to the compact representation which shares the advantages of high-quality and low-complexity. The proposed mask representation can be easily integrated into two-stage pipelines such as Mask R-CNN, improving mask AP by 0.9% on the COCO dataset, 1.4% on the LVIS dataset, and 2.1% on the Cityscapes dataset. Moreover, a novel single shot instance segmentation framework can be constructed by extending the existing one-stage detector with a mask branch for this instance representation. Our model shows its superiority over the explicit contour-based pipelines in accuracy with similar computational complexity. We also evaluate our method for video instance segmentation, achieving promising results on YouTube-VIS dataset. Code is available at: https://git.io/AdelaiDet (c) 2021 Elsevier Ltd. All rights reserved.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据