4.5 Article

E2VIDX: improved bridge between conventional vision and bionic vision

Journal

FRONTIERS IN NEUROROBOTICS
Volume 17, Issue -, Pages -

Publisher

FRONTIERS MEDIA SA
DOI: 10.3389/fnbot.2023.1277160

Keywords

image reconstruction; deep learning; dynamic vision sensor; event camera; image classification; object detection; instance segmentation

Ask authors/readers for more resources

Event cameras have the advantages of low delay, high dynamic range, and no motion blur, but encounter obstacles due to their unique data representation. This study proposes a network called E2VIDX based on the image reconstruction algorithm for event cameras, which achieves better feature fusion and reduces the network model size. A new loss function is also introduced. Experimental results show significant improvements compared to existing methods, with increased Structural Similarity (SSIM) and decreased Learned Perceptual Image Patch Similarity (LPIPS) and Mean Squared Error (MSE). The proposed method is also evaluated in image classification, object detection, and instance segmentation, demonstrating its effectiveness in applying existing vision algorithms.
Common RGBD, CMOS, and CCD-based cameras produce motion blur and incorrect exposure under high-speed and improper lighting conditions. According to the bionic principle, the event camera developed has the advantages of low delay, high dynamic range, and no motion blur. However, due to its unique data representation, it encounters significant obstacles in practical applications. The image reconstruction algorithm based on an event camera solves the problem by converting a series of events into common frames to apply existing vision algorithms. Due to the rapid development of neural networks, this field has made significant breakthroughs in past few years. Based on the most popular Events-to-Video (E2VID) method, this study designs a new network called E2VIDX. The proposed network includes group convolution and sub-pixel convolution, which not only achieves better feature fusion but also the network model size is reduced by 25%. Futhermore, we propose a new loss function. The loss function is divided into two parts, first part calculates the high level features and the second part calculates the low level features of the reconstructed image. The experimental results clearly outperform against the state-of-the-art method. Compared with the original method, Structural Similarity (SSIM) increases by 1.3%, Learned Perceptual Image Patch Similarity (LPIPS) decreases by 1.7%, Mean Squared Error (MSE) decreases by 2.5%, and it runs faster on GPU and CPU. Additionally, we evaluate the results of E2VIDX with application to image classification, object detection, and instance segmentation. The experiments show that conversions using our method can help event cameras directly apply existing vision algorithms in most scenarios.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available