Deep Video Quality Assessor: From
Spatio-temporal Visual Sensitivity to A
Convolutional Neural Aggregation Network
Abstract. Incorporating spatio-temporal human visual perception into
video quality assessment (VQA) remains a formidable issue. Previous
statistical or computational models of spatio-temporal perception have
limitations to be applied to the general VQA algorithms. In this paper,
we propose a novel full-reference (FR) VQA framework named Deep
Video Quality Assessor (DeepVQA) to quantify the spatio-temporal visual perception via a convolutional neural network (CNN) and a convolutional neural aggregation network (CNAN). Our framework enables
to figure out the spatio-temporal sensitivity behavior through learning
in accordance with the subjective score. In addition, to manipulate the
temporal variation of distortions, we propose a novel temporal pooling
method using an attention model. In the experiment, we show DeepVQA
remarkably achieves the state-of-the-art prediction accuracy of more than
0.9 correlation, which is ?5% higher than those of conventional methods
on the LIVE and CSIQ video databases