Journal
MULTIMEDIA TOOLS AND APPLICATIONS
Volume 81, Issue 29, Pages 42849-42869Publisher
SPRINGER
DOI: 10.1007/s11042-022-13485-9
Keywords
3D convolutional neural networks; Fine-tuning; Objective quality assessment; Pre-training; Stereoscopic video; Transfer learning
Categories
Funding
- Scientific and Technological Research Council of Turkey (TUBITAK) [118C301]
Ask authors/readers for more resources
Recently, 3D Convolutional Neural Networks (3D CNNs) have shown superior performance over 2D CNNs in video processing applications. In the field of Stereoscopic Video Quality Assessment (SVQA), 3D CNNs are used to extract spatio-temporal features from stereoscopic videos. Pre-trained 3D Residual Networks (3D ResNets) on the Kinetics dataset are fine-tuned to measure the quality of stereoscopic videos and propose a no-reference SVQA method. Experimental results on publicly available SVQA datasets demonstrate the effectiveness of the proposed transfer learning-based method.
Recently, Convolutional Neural Networks with 3D kernels (3D CNNs) have shown great superiority over 2D CNNs for video processing applications. In the field of Stereoscopic Video Quality Assessment (SVQA), 3D CNNs are utilized to extract the spatio-temporal features from the stereoscopic video. Besides, the emergence of substantial video datasets such as Kinetics has made it possible to use pre-trained 3D CNNs in other video-related fields. In this paper, we fine-tune 3D Residual Networks (3D ResNets) pre-trained on the Kinetics dataset for measuring the quality of stereoscopic videos and propose a no-reference SVQA method. Specifically, our aim is twofold: Firstly, we answer the question: can we use 3D CNNs as a quality-aware feature extractor from stereoscopic videos or not. Secondly, we explore which ResNet architecture is more appropriate for SVQA. Experimental results on two publicly available SVQA datasets of LFOVIAS3DPh2 and NAMA3DS1-COSPAD1 show the effectiveness of the proposed transfer learning-based method for SVQA that provides the RMSE of 0.332 in LFOVIAS3DPh2 dataset. Also, the results show that deeper 3D ResNet models extract more efficient quality-aware features.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available