Related references
Note: Only part of the references are listed.
Article
Computer Science, Artificial Intelligence
Kai Han et al.
Summary: Transformer, a deep neural network with a self-attention mechanism, has been initially used in natural language processing and is now gaining attention in computer vision tasks. Transformer-based models perform as well as or even better than convolutional and recurrent neural networks in various visual benchmarks. This paper reviews vision transformer models, categorizes them based on different tasks, and analyzes their advantages and disadvantages. The discussed categories include backbone network, high/mid-level vision, low-level vision, and video processing. Efficient methods for applying transformer in real device-based applications are also explored. The challenges and further research directions for vision transformers are discussed as well.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Computer Science, Artificial Intelligence
Fatima Ezzahra Benkirane et al.
Summary: This paper aims to integrate human knowledge and human-like reasoning used for monocular depth estimation within deep neural networks. The proposed approach involves directly integrating geometric, semantic, and contextual information into the monocular depth estimation process using an ontology model in a deep learning context. Monocular cues information is extracted through reasoning performed on the proposed ontology and combined with RGB image as input to the deep neural network for depth estimation. The experimental results show that the proposed method improves upon state-of-the-art monocular depth estimation deep models and yields promising results for cross-evaluation, especially for unseen driving scenarios.
KNOWLEDGE-BASED SYSTEMS
(2023)
Article
Computer Science, Artificial Intelligence
Armin Masoumian et al.
Summary: This study proposes a new self-supervised monocular depth estimation model that utilizes GCN to handle irregular image regions and enhances the quantitative and qualitative understanding of depth maps. The method achieves a high prediction accuracy of 89% on the KITTI dataset and reduces the number of trainable parameters by 40% compared to existing solutions.
Article
Computer Science, Artificial Intelligence
Rui Li et al.
Summary: In this paper, a method is proposed to enhance self-supervised depth estimation by incorporating semantic information and using both implicit and explicit semantic guidances. The proposed Semantic-aware Spatial Feature Modulation scheme relates depth distributions to semantic category information, implicitly modulating semantic and depth features. A semantic-guided ranking loss is also proposed to explicitly constrain the estimated depth borders using segmentation labels. Extensive experimental results demonstrate that the proposed method outperforms state-of-the-art methods in terms of producing high-quality depth maps with semantically consistent depth distributions and accurate depth edges.
PATTERN RECOGNITION
(2023)
Article
Engineering, Electrical & Electronic
Xihao Liu et al.
Summary: This article proposes a novel encoder-decoder network for real-time monocular depth estimation on edge devices. The network merges semantic information at a global field via an efficient transformer-based module to provide more details of the object for depth assignment. The network achieves an outstanding balance between accuracy and speed.
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT
(2023)
Review
Computer Science, Artificial Intelligence
Yujian Mo et al.
Summary: This paper reviews the state-of-the-art technologies of semantic segmentation based on deep learning and investigates related works on weakly-supervised, domain adaptation, multi-modal data fusion, and real-time semantic segmentation.
Article
Computer Science, Artificial Intelligence
Rene Ranftl et al.
Summary: The success of monocular depth estimation relies on large and diverse training sets. This study proposes tools and methods to mix different datasets and improve the performance of monocular depth estimation.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2022)
Article
Computer Science, Artificial Intelligence
Yang Wang et al.
Summary: In this paper, a novel single category instance segmentation method called MetricMask is introduced, which can be easily embedded into most off-the-shelf detection and segmentation methods. By fusing object detection, semantic segmentation, and metric learning, our method can perform segmentation for all instances at once. Experimental results demonstrate that our method has excellent competitiveness on two standard datasets.
Article
Engineering, Electrical & Electronic
Xuyang Meng et al.
Summary: In this paper, we propose a novel model called context-based ordinal regression network (CORNet) for reconstructing monocular depth maps. By introducing a feature transformation module, a boundary enhancement module, and a feature optimization module, CORNet can capture fine depth features and enhance border depth. Experimental results on challenging datasets show that CORNet achieves effective monocular depth estimation and superior performance in capturing geometric features.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
(2022)
Review
Computer Science, Artificial Intelligence
Saddam Abdulwahab et al.
Summary: A novel depth map estimation technique based on an autoencoder network is proposed in this paper to address the challenge of generating depth maps from single RGB images. Experimental results demonstrate that the proposed model outperforms state-of-the-art approaches on two datasets, showing exceptional performance in preserving object boundaries and small 3D structures.
NEURAL COMPUTING & APPLICATIONS
(2022)
Article
Computer Science, Artificial Intelligence
Bocong Gao et al.
Summary: In this study, we propose an effective video object segmentation method based on multi-level target models and feature integration (MTMFI-VOS), which can address the challenges posed by small sizes, deformations, and occlusions of target objects. The proposed method achieves competitive accuracy on VOS benchmarks.
Review
Chemistry, Analytical
Armin Masoumian et al.
Summary: This paper provides a state-of-the-art review of the current developments in monocular depth estimation (MDE) based on deep learning techniques. It highlights the key points from various aspects and discusses limitations and future research directions in the field.
Article
Engineering, Civil
Xingshuai Dong et al.
Summary: This paper presents a comprehensive survey of monocular depth estimation (MDE), covering methods, performance evaluation metrics, datasets, and applications. It also summarizes open-source implementations of representative methods and discusses future research directions. The survey aims to assist readers in navigating this research field.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Zhengming Zhou et al.
Summary: The proposed Self-Distilled Feature Aggregation (SDFA) module for self-supervised monocular depth estimation effectively aggregates low-scale and high-scale features while maintaining their contextual consistency. By employing three branches to learn feature offset maps, the SDFA-based network outperforms state-of-the-art methods in most cases, as demonstrated on the KITTI dataset.
COMPUTER VISION - ECCV 2022, PT I
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Mu He et al.
Summary: This paper presents a resolution adaptive self-supervised monocular depth estimation method that achieves good performance at different resolutions by learning the scale invariance of scene depth.
COMPUTER VISION - ECCV 2022, PT XXVII
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Weihao Yuan et al.
Summary: Estimating accurate depth from a single image is challenging, but this study proposes a CRFs optimization approach that leverages fully-connected CRFs and a multi-head attention mechanism to optimize the depth map. Experimental results show significant improvements over previous methods on multiple datasets, and the proposed method also outperforms existing panorama methods.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022)
(2022)
Article
Computer Science, Artificial Intelligence
Chaoqiang Zhao et al.
Summary: This paper investigates the problem of unsupervised monocular depth estimation in highly complex scenarios and addresses this challenging problem by adopting an image transfer-based domain adaptation framework. Extensive experiments show the effectiveness of the proposed unsupervised framework in estimating the dense depth map from highly complex images.
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE
(2022)
Article
Engineering, Electrical & Electronic
Minsoo Song et al.
Summary: A new method for monocular depth estimation is proposed in this paper, which effectively utilizes the Laplacian pyramid in the decoder architecture to improve depth estimation accuracy. Additionally, adjusting weight standardization in convolution blocks can improve gradient flow and make optimization smoother.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
(2021)
Article
Computer Science, Artificial Intelligence
Lei He et al.
Summary: This paper introduces the concept of semantic objectness and proposes the Semantic Object Segmentation and Depth Estimation Network (SOSD-Net) based on the objectness assumption, which is the first network to exploit geometry constraint for simultaneous monocular depth estimation and semantic segmentation. By utilizing the iterative idea from the expectation-maximization algorithm to train the network effectively, extensive experimental results on Cityscapes and NYU v2 dataset demonstrate the superior performance of the proposed approach.
Proceedings Paper
Computer Science, Artificial Intelligence
Kun Wang et al.
Summary: The paper proposes a novel framework for monocular depth estimation in challenging nighttime scenarios, addressing issues like low visibility and varying illuminations. Key improvements include Priors-Based Regularization, MappingConsistent Image Enhancement module, and Statistics-Based Mask strategy. Experimental results show the effectiveness of each component, with the framework achieving remarkable improvements and state-of-the-art results on two nighttime datasets.
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021)
(2021)
Article
Chemistry, Analytical
Dong-Hoon Kwak et al.
Article
Engineering, Electrical & Electronic
Yuanzhouhan Cao et al.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
(2018)
Proceedings Paper
Computer Science, Artificial Intelligence
Huangying Zhan et al.
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)
(2018)
Article
Computer Science, Artificial Intelligence
Fayao Liu et al.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2016)
Proceedings Paper
Computer Science, Artificial Intelligence
David Eigen et al.
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)
(2015)
Article
Robotics
A. Geiger et al.
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH
(2013)