☆ 3.8 Proceedings Paper

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022) (2022)

Probabilistic Embeddings for Cross-Modal Retrieval

Sanghyuk Chun et al.

Summary: Cross-modal retrieval methods aim to build a common representation space for samples from different modalities, such as vision and language. This paper introduces Probabilistic Cross-Modal Embedding (PCME) to represent samples as probabilistic distributions, showing improved retrieval performance and providing uncertainty estimates for better interpretability. By evaluating on the CUB dataset with exhaustive annotations, PCME outperforms deterministic methods in capturing one-to-many correspondences.

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 (2021)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

StacMR: Scene-Text Aware Cross-Modal Retrieval

Andres Mafla et al.

Summary: This paper introduces a new dataset for cross-modal retrieval involving scene-text instances, proposes approaches leveraging scene text, and conducts experiments to confirm the benefits of utilizing scene text.

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021 (2021)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Hierarchical Multimodal LSTM for Dense Visual-Semantic Embedding

Zhenxing Niu et al.

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) (2017)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Self-supervised learning of visual features through embedding images into text topic spaces

Lluis Gomez et al.

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) (2017)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Beyond instance-level image retrieval: Leveraging captions to learn a global visual representation for semantic retrieval

Albert Gordo et al.

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) (2017)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics

Micah Hodosh et al.

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH (2013)

添加到收藏夹

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

相关参考文献

Probabilistic Embeddings for Cross-Modal Retrieval

StacMR: Scene-Text Aware Cross-Modal Retrieval

Hierarchical Multimodal LSTM for Dense Visual-Semantic Embedding

Self-supervised learning of visual features through embedding images into text topic spaces

Beyond instance-level image retrieval: Leveraging captions to learn a global visual representation for semantic retrieval

Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics

导出引文

分享论文