☆ 4.7 Article

Panel-Page-Aware Comic Genre Understanding

IEEE TRANSACTIONS ON IMAGE PROCESSING (2023)

期刊

IEEE TRANSACTIONS ON IMAGE PROCESSING

卷 32, 期 -, 页码 2636-2648

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TIP.2023.3270105

关键词

Media; Task analysis; Videos; Visualization; Training; Testing; Feature extraction; Comic; multi-image classification; deep learning; attention mechanism

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Using a sequence of discrete still images to tell a story or introduce a process has become a tradition in the field of digital visual media. Comic, as a representative form, has gone digital and requires a different approach due to its unique characteristics. In this paper, the authors propose P-2 Comic, a model that takes page sequences of comics as input and uses panel-page understanding to classify the genre of the comic. Experimental results show that their approach outperforms existing methods and is applicable in multi-modal scenarios.

Using a sequence of discrete still images to tell a story or introduce a process has become a tradition in the field of digital visual media. With the surge in these media and the requirements in downstream tasks, acquiring their main topics or genres in a very short time is urgently needed. As a representative form of the media, comic enjoys a huge boom as it has gone digital. However, different from natural images, comic images are divided by panels, and the images are not visually consistent from page to page. Therefore, existing works tailored for natural images perform poorly in analyzing comics. Considering the identification of comic genres is tied to the overall story plotting, a long-term understanding that makes full use of the semantic interactions between multi-level comic fragments needs to be fully exploited. In this paper, we propose P-2 Comic, a Panel-Page-aware Comic genre classification model, which takes page sequences of comics as the input and produces class-wise probabilities. P-2 Comic utilizes detected panel boxes to extract panel representations and deploys self-attention to construct panel-page understanding, assisted with interdependent classifiers to model label correlation. We develop the first comic dataset for the task of comic genre classification with multi-genre labels. Our approach is proved by experiments to outperform state-of-the-art methods on related tasks. We also validate the extensibility of our network to perform in the multi-modal scenario. Finally, we show the practicability of our approach by giving effective genre prediction results for whole comic books.

Panel-Page-Aware Comic Genre Understanding

期刊

IEEE TRANSACTIONS ON IMAGE PROCESSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Panel-Page-Aware Comic Genre Understanding

期刊

IEEE TRANSACTIONS ON IMAGE PROCESSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文