4.6 Article

Stereoscopic video quality measurement with fine-tuning 3D ResNets

Journal

MULTIMEDIA TOOLS AND APPLICATIONS
Volume 81, Issue 29, Pages 42849-42869

Publisher

SPRINGER
DOI: 10.1007/s11042-022-13485-9

Keywords

3D convolutional neural networks; Fine-tuning; Objective quality assessment; Pre-training; Stereoscopic video; Transfer learning

Funding

  1. Scientific and Technological Research Council of Turkey (TUBITAK) [118C301]

Ask authors/readers for more resources

Recently, 3D Convolutional Neural Networks (3D CNNs) have shown superior performance over 2D CNNs in video processing applications. In the field of Stereoscopic Video Quality Assessment (SVQA), 3D CNNs are used to extract spatio-temporal features from stereoscopic videos. Pre-trained 3D Residual Networks (3D ResNets) on the Kinetics dataset are fine-tuned to measure the quality of stereoscopic videos and propose a no-reference SVQA method. Experimental results on publicly available SVQA datasets demonstrate the effectiveness of the proposed transfer learning-based method.
Recently, Convolutional Neural Networks with 3D kernels (3D CNNs) have shown great superiority over 2D CNNs for video processing applications. In the field of Stereoscopic Video Quality Assessment (SVQA), 3D CNNs are utilized to extract the spatio-temporal features from the stereoscopic video. Besides, the emergence of substantial video datasets such as Kinetics has made it possible to use pre-trained 3D CNNs in other video-related fields. In this paper, we fine-tune 3D Residual Networks (3D ResNets) pre-trained on the Kinetics dataset for measuring the quality of stereoscopic videos and propose a no-reference SVQA method. Specifically, our aim is twofold: Firstly, we answer the question: can we use 3D CNNs as a quality-aware feature extractor from stereoscopic videos or not. Secondly, we explore which ResNet architecture is more appropriate for SVQA. Experimental results on two publicly available SVQA datasets of LFOVIAS3DPh2 and NAMA3DS1-COSPAD1 show the effectiveness of the proposed transfer learning-based method for SVQA that provides the RMSE of 0.332 in LFOVIAS3DPh2 dataset. Also, the results show that deeper 3D ResNet models extract more efficient quality-aware features.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available