Research published in the International Journal of Computational Vision and Robotics, points to several approaches that might be used to up-convert Super SloMo video files with deep learning offering improvements in final quality. The methods described offer a way to convert a video with a lower number of frames per second to be converted to one with a higher number of frames per second.
Minseop Kim and Haechul Choi of the Hanbat National University in Daejeon, Republic of Korea explain how a training data set can be used to gain optimal results with Super SloMo boosting signal-to-noise ratio significantly. Super SloMo is a deep learning-based frame rate up-conversion (FRUC) method proposed by graphics hardware company NVIDIA. The current team’s approach works with this and can preclude flickering effects when displaying video that does not match the quality of the display itself by creating frames between frames using the techniques of artificial intelligence. This allows a more natural up-conversion to be carried out whereas earlier approaches can successfully reduce flicker but look unnatural. The new approach avoids the negative impact that can be seen when a bad motion vector is used to add frames.
The team trained the system with thousands of videos showing various moving objects of different sizes. The large objects dataset contained more than 50000 images of basketball, soccer, volleyball, marathons, and vehicles. The dataset with small moving objects contained more than 50000 images of golf, badminton, table tennis, and tennis. A similar-sized dataset of both large and small objects was also used.
“The results of training by object size shows that the performance was improved in terms of peak signal-to-noise ratio (PSNR) and the mean of the structural similarity index (MSSIM) in most cases when the training set and the validation set had similar properties,” the team reports. Specifically, “The experimental results show that the two proposed methods improved the peak signal-to-noise ratio and the mean of the structural similarity index by 0.11 dB and 0.033% with the specialised training set and by 0.37 dB and 0.077% via adjusting the reconstruction and warping loss parameters, respectively,” the team writes.
Kim, M. and Choi, H. (2021) ‘A high-quality frame rate up-conversion technique for Super SloMo’, Int. J. Computational Vision and Robotics, Vol. 11, No. 5, pp.512–525.