4.7 Article

Mask encoding: A general instance mask representation for object segmentation

期刊

PATTERN RECOGNITION
卷 124, 期 -, 页码 -

出版社

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2021.108505

关键词

Mask encoding; Instance segmentation; Video instance segmentation

资金

  1. National Natural Science Foundation of China [62073244]
  2. Shanghai Innovation Action Plan [20511100500, 20511105802]
  3. Innovation Program of Shanghai Municipal Education Commission [202101070007E00098]

向作者/读者索取更多资源

Instance segmentation is a challenging task in computer vision that requires separating each instance at the pixel level. Current dominant representation for instance masks is a low-resolution binary mask. This work proposes an effective approach to encode high-resolution structured masks into a compact representation that combines high quality and low complexity. The proposed method can be easily integrated into existing pipelines and improves the mask average precision (AP) on various datasets.
Instance segmentation is one of the most challenging tasks in computer vision, which requires separating each instance in pixels. To date, a low-resolution binary mask is the dominant paradigm for representation of instance mask. For example, the size of the predicted mask in Mask R-CNN is usually 28 x 28 . Generally, a low-resolution mask can not capture the object details well, while a high-resolution mask dramatically increases the training complexity. In this work, we propose a flexible and effective approach to encode the high-resolution structured mask to the compact representation which shares the advantages of high-quality and low-complexity. The proposed mask representation can be easily integrated into two-stage pipelines such as Mask R-CNN, improving mask AP by 0.9% on the COCO dataset, 1.4% on the LVIS dataset, and 2.1% on the Cityscapes dataset. Moreover, a novel single shot instance segmentation framework can be constructed by extending the existing one-stage detector with a mask branch for this instance representation. Our model shows its superiority over the explicit contour-based pipelines in accuracy with similar computational complexity. We also evaluate our method for video instance segmentation, achieving promising results on YouTube-VIS dataset. Code is available at: https://git.io/AdelaiDet (c) 2021 Elsevier Ltd. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据