☆ 4.7 Article

Automated multimodal sensemaking: Ontology-based integration of linguistic frames and visual data

COMPUTERS IN HUMAN BEHAVIOR (2024)

期刊

COMPUTERS IN HUMAN BEHAVIOR

卷 150, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.chb.2023.107997

关键词

Multimodal sensemaking; Ontology engineering; Knowledge graph construction; Frame -based reasoning; Visual and linguistic frames

类别

Psychology, Multidisciplinary Psychology, Experimental

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

We propose an explainable automated multimodal sensemaking approach by linking linguistic frames to physical visual occurrences. We analyze the Visual Genome image dataset and introduce the Visual Sense Ontology (VSO) to enhance the multimodal data. We establish a framal knowledge expansion pipeline and create the queryable Visual Sense Knowledge Graph (VSKG) to connect linguistic frames with images. Our work represents a significant advancement in frame evocation and multimodal sensemaking automation.

Frame evocation from visual data is an essential process for multimodal sensemaking, due to the multimodal abstraction provided by frame semantics. However, there is a scarcity of data-driven approaches and tools to automate it. We propose a novel approach for explainable automated multimodal sensemaking by linking linguistic frames to their physical visual occurrences, using ontology-based knowledge engineering techniques. We pair the evocation of linguistic frames from text to visual data as framal visual manifestations. We present a deep ontological analysis of the implicit data model of the Visual Genome image dataset, and its formalization in the novel Visual Sense Ontology (VSO). To enhance the multimodal data from this dataset, we introduce a framal knowledge expansion pipeline that extracts and connects linguistic frames - including values and emotions - to images, using multiple linguistic resources for disambiguation. It then introduces the Visual Sense Knowledge Graph (VSKG), a novel resource. VSKG is a queryable knowledge graph that enhances the accessibility and comprehensibility of Visual Genome's multimodal data, based on SPARQL queries. VSKG includes frame visual evocation data, enabling more advanced forms of explicit reasoning, analysis and sensemaking. Our work represents a significant advancement in the automation of frame evocation and multimodal sense-making, performed in a fully interpretable and transparent way, with potential applications in various fields, including the fields of knowledge representation, computer vision, and natural language processing.

Automated multimodal sensemaking: Ontology-based integration of linguistic frames and visual data

期刊

COMPUTERS IN HUMAN BEHAVIOR

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Automated multimodal sensemaking: Ontology-based integration of linguistic frames and visual data

期刊

COMPUTERS IN HUMAN BEHAVIOR

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文