4.7 Article

Intelligent Driver Drowsiness Detection for Traffic Safety Based on Multi CNN Deep Model and Facial Subsampling

Journal

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
Volume 23, Issue 10, Pages 19743-19752

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TITS.2021.3134222

Keywords

Driver drowsiness detection; convolutional neural network; ensemble network; multitask cascaded convolutional networks (MTCNN)

Ask authors/readers for more resources

Fatigue, drowsiness, and distraction are the main causes of road accidents worldwide. Existing solutions either extract physiological signals of the driver or use computer vision techniques, but they have limited performances. Therefore, this study proposes an ensemble deep learning architecture to determine the state of the driver by incorporating features from the eyes and mouth. The model achieves high accuracy when trained and evaluated on the NTHU-DDD video dataset.
Facts reveal that numerous road accidents worldwide occur due to fatigue, drowsiness, and distraction while driving. Few works on the automated drowsiness detection problem, propose to extract physiological signals of the driver including ECG, EEG, heart variability rate, blood pressure, etc. which make those solutions non-ideal. While recent ones propose computer vision-based solutions but show limited performances as either they use hand-crafted features with conventional techniques like Naive Bayes and SVM or use excessively bulky deep learning models which are still low on performances. Hence in this work, we propose an ensemble deep learning architecture that operates over incorporated features of eyes and mouth subsamples along with a decision structure to determine the fitness of the driver. The proposed ensemble model consists of only two InceptionV3 modules that help in containing the parameter space of the network. These two modules respectively and exclusively perform feature extraction of eyes and mouth subsamples extracted using the MTCNN from the face images. Their respective output is passed to the ensemble boundary using the weighted average method whose weights are tuned using the ensemble algorithm. The output of this system determines whether the driver is drowsy or non-drowsy. The benchmark NTHU-DDD video dataset is used for effective training and evaluation of the proposed model. The model established a train and validation accuracy of 99.65% and 98.5% respectively with an accuracy of 97.1% on the evaluation dataset which is significantly higher than those achieved by models proposed in recent works on this dataset.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available