4.5 Article

Analysis of the Hilbert Spectrum for Text-Dependent Speaker Verification

Journal

SPEECH COMMUNICATION
Volume 96, Issue -, Pages 207-224

Publisher

ELSEVIER
DOI: 10.1016/j.specom.2017.12.001

Keywords

MEMD; IMFs; HS; MFCCs; TDSV

Ask authors/readers for more resources

This work explores the utility of the Hilbert Spectrum (HS) of the speech signal, constructed from its AM-FM components, or Intrinsic Mode Functions (IMFs), in characterizing speakers for the task of Text-Dependent Speaker Verification (TDSV). The IMFs of the speech signal are obtained using a non-linear and non-stationary data analysis technique called Modified Empirical Mode Decomposition (MEMD). The HS, which is a representation of the instantaneous frequencies and instantaneous energies of the IMFs, is processed in short time-segments to generate features, which are then experimented for the task of TDSV. Two databases the RSR2015 and the IITG are utilized in this work, for validating the experimental findings. The performances of the TDSV system are evaluated for the individual features, and their combinations with the 39-dimensional Mel Frequency Cepstral Coefficients (MFCCs). To assess the practical utility of the features, they are tested not only for clean speech, but also for speech corrupted by low-frequency (Babble) noise, and environmental noise. The experiments reveal that the features obtained from the HS, in combination with the MFCCs, enhances the performance of the TDSV system. Further, the features extracted are effective at very low dimensions. Moreover, the features extracted from the HS are found to be consistently more effective than cepstral/energy feature obtained from the raw IMFs, under noisy conditions.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available