☆ 4.6 Article

TB-MFCC multifuse feature for emergency vehicle sound classification using multistacked CNN - Attention BiLSTM

BIOMEDICAL SIGNAL PROCESSING AND CONTROL (2024)

Journal

BIOMEDICAL SIGNAL PROCESSING AND CONTROL

Volume 88, Issue -, Pages -

Publisher

ELSEVIER SCI LTD

DOI: 10.1016/j.bspc.2023.105688

Keywords

Augmentation; CNN; Feature extraction; FPR; MFCC; RMS; ZCR

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper focuses on developing a suitable model and algorithms for data augmentation, feature extraction, and classification in order to accurately identify and classify emergency vehicles based on sound. By using signal augmentation and a new feature extraction method, combined with convolutional neural networks and long short term memory models, the accuracy of vehicle sound identification and classification can be improved.

Vehicles equipped for emergencies like ambulances, fire engines, and police cruisers play a vital role in society by responding quickly to emergencies and helping to prevent loss of life and maintain order. Vehicle sound identification and classification are very important in the cities to identify emergency vehicles easily and to clear the traffic effectively. Convolutional Neural Network plays an important role in the accurate prediction of vehicles during an emergency. The main motive of this paper is to develop a suitable model and algorithms for data augmentation, feature extraction, and classification. The proposed TB-MFCC multifuse feature is comprised of data augmentation and feature extraction. First, in the proposed signal augmentation, each audio signal uses noise injection, stretching, shifting, and pitching separately and this process increases the number of instances in the dataset. The proposed augmentation reduces the overfitting problem in the network. Second, Triangular Bluestein Mel Frequency Cepstral Coefficients (TB-MFCC) are proposed and fused with Zero Crossing Rate (ZCR), Mel-frequency cepstral coefficients (MFCC), Root Mean Square (RMS), Chroma, and Tempogram to extract the exact feature which increases the accuracy and reduces the Mean Squared Error (MSE) of the model during classification. Finally, the proposed Multi-stacked Convolutional Neural Network (MCNN) with Attention-based Bidirectional Long Short Term Memory (A-BiLSTM) improves the nonlinear relationship among the features. The proposed Pooled Multifuse Feature Augmentation (PMFA) with MCNN & A-BiLSTM increases the accuracy (98.66 %), reduces the False Positive Rate (FPR) by 1.01 %, and loss (0 %). Thus the model predicts the sound without overfitting, underfitting, and vanishing gradient problems.

TB-MFCC multifuse feature for emergency vehicle sound classification using multistacked CNN - Attention BiLSTM

Journal

BIOMEDICAL SIGNAL PROCESSING AND CONTROL

Publisher

ELSEVIER SCI LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

TB-MFCC multifuse feature for emergency vehicle sound classification using multistacked CNN - Attention BiLSTM

Journal

BIOMEDICAL SIGNAL PROCESSING AND CONTROL

Publisher

ELSEVIER SCI LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper