3.8 Proceedings Paper

AUDIO ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF AUDIO REPRESENTATION

Journal

Publisher

IEEE
DOI: 10.1109/SLT48900.2021.9383575

Keywords

Self-supervised learning; Weight sharing; Network compression; transformer; Speech representation learning

Ask authors/readers for more resources

The study introduces a simplified version of self-supervised speech representation model, Audio ALBERT, with 91% fewer parameters while performing well in downstream tasks. The representations in internal layers can encode more information about phoneme and speaker.
Self-supervised speech models are powerful speech representation extractors for downstream applications. Recently, larger models have been utilized in acoustic model training to achieve better performance. We propose Audio ALBERT, a lite version of the self-supervised speech representation model. We apply the lightweight representation extractor to two downstream tasks, speaker classification and phoneme classification. We show that Audio ALBERT achieves performance comparable with massive pre-trained networks in the downstream tasks while having 91% fewer parameters. Moreover, we design probing models to measure how much the latent representations can encode the speaker's and phoneme's information. We find that the representations encoded in internal layers of Audio ALBERT contain more information for both phoneme and speaker than the last layer, which is generally used for downstream tasks. Our findings provide a new avenue for using self-supervised networks to achieve better performance and efficiency.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available