☆ 4.6 Article

Vision Transformer in Industrial Visual Inspection

APPLIED SCIENCES-BASEL (2022)

Journal

APPLIED SCIENCES-BASEL

Volume 12, Issue 23, Pages -

Publisher

MDPI

DOI: 10.3390/app122311981

Keywords

deep learning; computer vision; vision transformer; attention mechanism; automated industrial visual inspection; defect detection

Funding

German Federal Ministry for Digital and Transport in the programfuture rail freight transport [53T20011UW]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Artificial intelligence has been considered as an approach to visual inspection in industrial applications for decades. Recent advances in deep learning, particularly in attention-based vision transformer architectures, have the potential to enable automated visual inspection even in complex environmental conditions. However, the application of vision transformers to real world visual inspection is still limited, possibly due to the assumption that they require large amounts of data to be effective.

Artificial intelligence as an approach to visual inspection in industrial applications has been considered for decades. Recent successes, driven by advances in deep learning, present a potential paradigm shift and have the potential to facilitate an automated visual inspection, even under complex environmental conditions. Thereby, convolutional neural networks (CNN) have been the de facto standard in deep-learning-based computer vision (CV) for the last 10 years. Recently, attention-based vision transformer architectures emerged and surpassed the performance of CNNs on benchmark datasets, regarding regular CV tasks, such as image classification, object detection, or segmentation. Nevertheless, despite their outstanding results, the application of vision transformers to real world visual inspection is sparse. We suspect that this is likely due to the assumption that they require enormous amounts of data to be effective. In this study, we evaluate this assumption. For this, we perform a systematic comparison of seven widely-used state-of-the-art CNN and transformer based architectures trained in three different use cases in the domain of visual damage assessment for railway freight car maintenance. We show that vision transformer models achieve at least equivalent performance to CNNs in industrial applications with sparse data available, and significantly surpass them in increasingly complex tasks.

Vision Transformer in Industrial Visual Inspection

Journal

APPLIED SCIENCES-BASEL

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Vision Transformer in Industrial Visual Inspection

Journal

APPLIED SCIENCES-BASEL

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper