4.8 Article

Dawn of the Transformer Era in Speech Emotion Recognition: Closing the Valence Gap

Related references

Note: Only part of the references are listed.
Article Computer Science, Artificial Intelligence

A Survey on Vision Transformer

Kai Han et al.

Summary: Transformer, a deep neural network with a self-attention mechanism, has been initially used in natural language processing and is now gaining attention in computer vision tasks. Transformer-based models perform as well as or even better than convolutional and recurrent neural networks in various visual benchmarks. This paper reviews vision transformer models, categorizes them based on different tasks, and analyzes their advantages and disadvantages. The discussed categories include backbone network, high/mid-level vision, low-level vision, and video processing. Efficient methods for applying transformer in real device-based applications are also explored. The challenges and further research directions for vision transformers are discussed as well.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Computer Science, Software Engineering

Machine Learning Testing: Survey, Landscapes and Horizons

Jie M. Zhang et al.

Summary: This paper provides a comprehensive survey of techniques for testing machine learning systems and analyzes trends and challenges in ML testing, offering promising research directions for the future.

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING (2022)

Article Acoustics

Survey on bimodal speech emotion recognition from acoustic and linguistic information fusion

Bagus Tris Atmaja et al.

Summary: This paper presents a survey on bimodal emotion recognition which combines acoustic and linguistic information. It reviews five components of bimodal SER and presents major findings from commonly used datasets. The survey also proposes future research directions in this field.

SPEECH COMMUNICATION (2022)

Proceedings Paper Acoustics

Probing Speech Emotion Recognition Transformers for Linguistic Knowledge

Andreas Triantafyllopoulos et al.

Summary: This study investigates the utilization of linguistic information in speech emotion recognition using pre-trained neural networks. The findings suggest that transformer models can effectively leverage linguistic information to improve valence predictions.

INTERSPEECH 2022 (2022)

Article Computer Science, Artificial Intelligence

Unsupervised Personalization of an Emotion Recognition System: The Unique Properties of the Externalization of Valence in Speech

Kusha Sridhar et al.

Summary: Predicting valence from speech is an important and challenging problem. This study proposes an unsupervised approach to adapt the speech emotion recognition system to the target speakers in the test set. By searching for speakers with similar acoustic patterns and creating an adaptation set, the models can be personalized. The results show significant improvements in valence prediction using these unsupervised approaches.

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING (2022)

Article Computer Science, Artificial Intelligence

Ethics and Good Practice in Computational Paralinguistics

Anton Batliner et al.

Summary: With the rapid development of artificial intelligence, ethical considerations have gained increasing attention. However, there is still insufficient focus on ethical issues in the field of computational paralinguistics. This article provides an overview of ethics and privacy, describes the field of computational paralinguistics and its applications, and proposes guidelines for good practice, establishing a foundation for ethical standards in the field.

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING (2022)

Article Engineering, Electrical & Electronic

The Hitchhiker's Guide to Bias and Fairness in Facial Affective Signal Processing: Overview and techniques

Jiaee Cheong et al.

Summary: The increasing prevalence of facial analysis technology has raised concerns about bias in the tools. Despite efforts to address bias, understanding, investigating, and mitigating bias in facial affect analysis remain underexplored areas. This work provides an overview of bias definitions, measures of fairness, algorithms, and techniques in facial affective signal processing, while discussing opportunities for further research.

IEEE SIGNAL PROCESSING MAGAZINE (2021)

Article Computer Science, Artificial Intelligence

SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild

Jean Kossaifi et al.

Summary: Natural human-computer interaction and audio-visual human behaviour sensing systems are more important than ever, as digital devices become increasingly integral to our lives. The SEWA database provides a valuable resource with over 2000 minutes of audio-visual data from 398 individuals representing six cultures, aiding research in affective computing and automatic human sensing.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2021)

Proceedings Paper Audiology & Speech-Language Pathology

SUPERB: Speech processing Universal PERformance Benchmark

Shu-wen Yang et al.

Summary: Self-supervised learning is important in advancing research in NLP and CV, but there is a lack of a similar setup for speech processing. The SUPERB benchmark is introduced to evaluate the performance of a shared model across various speech tasks, with a focus on utilizing representations learned from SSL. Results show promising generalizability and accessibility of SSL representations across SUPERB tasks.

INTERSPEECH 2021 (2021)

Proceedings Paper Audiology & Speech-Language Pathology

Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings

Leonardo Pepino et al.

Summary: The study proposes a transfer learning method for speech emotion recognition, utilizing features extracted from pre-trained models to recognize emotions in speech. Evaluations are conducted on two standard emotion databases, comparing different feature extraction techniques and model architectures.

INTERSPEECH 2021 (2021)

Proceedings Paper Audiology & Speech-Language Pathology

Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training

Wei-Ning Hsu et al.

Summary: In this paper, we explore the self-supervised learning of speech representations and find that using target domain data during pre-training significantly improves performance. Pre-training in different domains enhances the generalization performance on domains not seen during training.

INTERSPEECH 2021 (2021)

Proceedings Paper Computer Science, Artificial Intelligence

MULTIMODAL EMOTION RECOGNITION WITH HIGH-LEVEL SPEECH AND TEXT FEATURES

Mariana Rodrigues Makiuchi et al.

Summary: The study proposed a novel cross-representation speech model and a CNN-based text emotion recognition model, addressing the issues of overfitting and learning based on superficial cues in emotion recognition tasks. By combining speech-based and text-based results using score fusion, the method surpassed current works on speech-only, text-only, and multimodal emotion recognition on the IEMOCAP dataset.

2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) (2021)

Article Acoustics

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units

Wei-Ning Hsu et al.

Summary: HuBERT is a self-supervised approach for speech representation learning, leveraging an offline clustering step to align target labels for BERT-like prediction loss and applying prediction loss only over masked regions to force the model to learn a combined acoustic and language model. Experimental results on various fine-tuning subsets of Librispeech and Libri-light benchmarks demonstrate the superiority of the HuBERT model.

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2021)

Proceedings Paper Acoustics

THE ROLE OF TASK AND ACOUSTIC SIMILARITY IN AUDIO TRANSFER LEARNING: INSIGHTS FROM THE SPEECH EMOTION RECOGNITION CASE

Andreas Triantafyllopoulos et al.

Summary: With the rise of deep learning, deep knowledge transfer has become one of the most effective techniques for achieving state-of-the-art performance using deep neural networks. The choice of pre-training task and acoustic condition differences between datasets influence the effectiveness of transfer learning in speech emotion recognition. Layers closer to the input show more adaptation during transfer learning, explaining the need to fine-tune all layers in previous works.

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) (2021)

Proceedings Paper Acoustics

CONTRASTIVE UNSUPERVISED LEARNING FOR SPEECH EMOTION RECOGNITION

Mao Li et al.

Summary: This study investigates how unsupervised representation learning on unlabeled datasets can benefit speech emotion recognition. The experiment results show that using the contrastive predictive coding method can significantly improve emotion recognition performance.

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) (2021)

Proceedings Paper Acoustics

COPYPASTE: AN AUGMENTATION METHOD FOR SPEECH EMOTION RECOGNITION

Raghavendra Pappagari et al.

Summary: Data augmentation is a widely used strategy for training robust machine learning models, and CopyPaste, a novel augmentation procedure proposed in this study, has shown significant improvements in speech emotion recognition. By concatenating utterances with different emotions, the model performance can be enhanced, particularly in noisy test conditions.

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) (2021)

Proceedings Paper Acoustics

FUSION APPROACHES FOR EMOTION RECOGNITION FROM SPEECH USING ACOUSTIC AND TEXT-BASED FEATURES

Leonardo Pepino et al.

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (2020)

Proceedings Paper Acoustics

MOCKINGJAY: UNSUPERVISED SPEECH REPRESENTATION LEARNING WITH DEEP BIDIRECTIONAL TRANSFORMER ENCODERS

Andy T. Liu et al.

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (2020)

Article Acoustics

PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

Qiuqiang Kong et al.

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2020)

Article Computer Science, Artificial Intelligence

Building Naturalistic Emotionally Balanced Speech Corpus by Retrieving Emotional Speech from Existing Podcast Recordings

Reza Lotfian et al.

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING (2019)

Article Physics, Multidisciplinary

A General Framework for Fair Regression

Jack Fitzsimons et al.

ENTROPY (2019)

Review Computer Science, Hardware & Architecture

Speech Emotion Recognition Two Decades in a Nutshell, Benchmarks, and Ongoing Trends

Bjoern W. Schuller

COMMUNICATIONS OF THE ACM (2018)

Proceedings Paper Computer Science, Artificial Intelligence

Detecting Vocal Irony

Felix Burkhardt et al.

LANGUAGE TECHNOLOGIES FOR THE CHALLENGES OF THE DIGITAL AGE, GSCL 2017 (2018)

Proceedings Paper Computer Science, Artificial Intelligence

AVEC 2018 Workshop and Challenge: Bipolar Disorder and Cross-Cultural Affect Recognition

Fabien Ringeval et al.

PROCEEDINGS OF THE 2018 AUDIO/VISUAL EMOTION CHALLENGE AND WORKSHOP (AVEC'18) (2018)

Editorial Material Computer Science, Artificial Intelligence

Multimodal Sentiment Intensity Analysis in Videos: Facial Gestures and Verbal Messages

Amir Zadeh et al.

IEEE INTELLIGENT SYSTEMS (2016)

Proceedings Paper Computer Science, Information Systems

ESC: Dataset for Environmental Sound Classification

Karol J. Piczak

MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE (2015)

Article Computer Science, Artificial Intelligence

Are They Different? Affect, Feeling, Emotion, Sentiment, and Opinion Detection in Text

Myriam Munezero et al.

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING (2014)

Review Computer Science, Artificial Intelligence

Affect Detection: An Interdisciplinary Review of Models, Methods, and Their Applications

Rafael A. Calvo et al.

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING (2010)

Review Computer Science, Artificial Intelligence

A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions

Zhihong Zeng et al.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2009)

Article Computer Science, Interdisciplinary Applications

IEMOCAP: interactive emotional dyadic motion capture database

Carlos Busso et al.

LANGUAGE RESOURCES AND EVALUATION (2008)