4.7 Article

Multimodal Assessment of Schizophrenia Symptom Severity From Linguistic, Acoustic and Visual Cues

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNSRE.2023.3307597

Keywords

Schizophrenia; parameter-efficient fine-tuning; multimodal fusion

Ask authors/readers for more resources

This study proposes a multimodal assessment model that predicts the severity of schizophrenia symptoms based on linguistic, acoustic, and visual behavior. The model combines deep learning techniques and a multimodal fusion framework to achieve superior performance in assessment.
Assessing the condition of every schizophrenia patient correctly normally requires lengthy and frequent interviews with professionally trained doctors. To alleviate the time and manual burden on those mental health professionals, this paper proposes a multimodal assessment model that predicts the severity level of each symptom defined in Scale for the Assessment of Thought, Language, and Communication (TLC) and Positive and Negative Syndrome Scale (PANSS) based on the patient ' s linguistic, acoustic, and visual behavior. The proposed deep-learning model consists of a multimodal fusion framework and four unimodal transformer-based backbone networks. The second-stage pre-training is introduced to make each off-the-shelf pre-trained model learn the pattern of schizophrenia data more effectively. It learns to extract the desired features from the view of its modality. Next, the pre-trained parameters are frozen, and the light-weight trainable unimodal modules are inserted and fine-tuned to keep the number of parameters low while maintaining the superb performance simultaneously. Finally, the four adapted unimodal modules are fused into a final multimodal assessment model through the proposed multimodal fusion framework. For the purpose of validation, we train and evaluate the proposed model on schizophrenia patients recruited from National Taiwan University Hospital, whose performance achieves 0.534/0.685 in MAE/MSE, outperforming the related works in the literature. Through the experimental results and ablation studies, as well as the comparison with other related multimodal assessment works, our approach not only demonstrates the superiority of our performance but also the effectiveness of our approach to extract and integrate information from multiple modalities

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available