Edge-Oriented Compressed Video Super-Resolution
Abstract
:1. Introduction
- We propose an edge-oriented compressed video super-resolution network (EOCVSR) to address the CVSR problem. By incorporating a structure that specifically processes edge information and introduces edge-related loss functions, EOCVSR is able to reconstruct richer details and output higher-quality frames.
- We propose a motion-guided alignment module (MGAM) to achieve precise bi-direction motion compensation. The utilization efficiency of temporal information is enhanced by employing explicit motion information to guide the generation of offsets for implicit temporal alignment.
- We propose an edge-oriented recurrent block (EORB) to reconstruct edge information. Combining the merits of explicit and implicit edge extraction enables the high-quality reconstruction of high-frequency components. In addition, a recurrent structure is also adopted to realize effective feature refinement.
2. Related Works
2.1. Video Quality Enhancement (VQE)
2.2. Video Super-Resolution (VSR)
2.3. Compressed Video Super-Resolution (CVSR)
3. The Proposed EOCVSR Approach
3.1. Overall Framework
3.2. Feature Extraction Module
3.3. Motion-Guided Alignment Module
3.4. Edge-Oriented Recurrent Block
3.5. Feature Reconstruction Module
3.6. Loss Function
4. Results
4.1. Experimental Setup
4.2. Performance of Proposed EOCVSR
4.2.1. Quality Enhancement
4.2.2. Rate–Distortion Performance
4.2.3. Subjective Performance
4.3. Ablation Study
4.3.1. Analysis of the EORB
4.3.2. Analysis of the Number of Recursions K
4.3.3. Analysis of the Number of EORBs M
4.3.4. Model Adaption
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Dai, Y.; Liu, D.; Wu, F. A convolutional neural network approach for post-processing in HEVC intra coding. In Proceedings of the MultiMedia Modeling: 23rd International Conference, MMM 2017, Reykjavik, Iceland, 4–6 January 2017; Proceedings, Part I 23. Springer: Berlin/Heidelberg, Germany, 2017; pp. 28–39. [Google Scholar]
- Wang, T.; Chen, M.; Chao, H. A novel deep learning-based method of improving coding efficiency from the decoder-end for HEVC. In Proceedings of the 2017 Data Compression Conference (DCC), Snowbird, UT, USA, 4–7 April 2017; pp. 410–419. [Google Scholar]
- Pan, Z.; Yi, X.; Zhang, Y.; Jeon, B.; Kwong, S. Efficient in-loop filtering based on enhanced deep convolutional neural networks for HEVC. IEEE Trans. Image Process. 2020, 29, 5352–5366. [Google Scholar] [CrossRef] [PubMed]
- Yang, R.; Xu, M.; Wang, Z.; Li, T. Multi-frame quality enhancement for compressed video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6664–6673. [Google Scholar]
- Guan, Z.; Xing, Q.; Xu, M.; Yang, R.; Liu, T.; Wang, Z. MFQE 2.0: A new approach for multi-frame quality enhancement on compressed video. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 949–963. [Google Scholar] [CrossRef] [PubMed]
- Deng, J.; Wang, L.; Pu, S.; Zhuo, C. Spatio-temporal deformable convolution for compressed video quality enhancement. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 10696–10703. [Google Scholar]
- Kappeler, A.; Yoo, S.; Dai, Q.; Katsaggelos, A.K. Video super-resolution with convolutional neural networks. IEEE Trans. Comput. Imaging 2016, 2, 109–122. [Google Scholar] [CrossRef]
- Wang, X.; Chan, K.C.; Yu, K.; Dong, C.; Change Loy, C. Edvr: Video restoration with enhanced deformable convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Chan, K.C.; Zhou, S.; Xu, X.; Loy, C.C. Basicvsr++: Improving video super-resolution with enhanced propagation and alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5972–5981. [Google Scholar]
- Tian, Y.; Zhang, Y.; Fu, Y.; Xu, C. Tdan: Temporally-deformable alignment network for video super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 3360–3369. [Google Scholar]
- Jo, Y.; Oh, S.W.; Kang, J.; Kim, S.J. Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3224–3232. [Google Scholar]
- Ho, M.M.; He, G.; Wang, Z.; Zhou, J. Down-sampling based video coding with degradation-aware restoration-reconstruction deep neural network. In Proceedings of the MultiMedia Modeling: 26th International Conference, MMM 2020, Daejeon, Republic of Korea, 5–8 January 2020; Proceedings, Part I 26. Springer: Berlin/Heidelberg, Germany, 2020; pp. 99–110. [Google Scholar]
- He, G.; Wu, S.; Pei, S.; Xu, L.; Wu, C.; Xu, K.; Li, Y. FM-VSR: Feature Multiplexing Video Super-Resolution for Compressed Video. IEEE Access 2021, 9, 88060–88068. [Google Scholar] [CrossRef]
- Ho, M.M.; Zhou, J.; He, G. RR-DnCNN v2. 0: Enhanced restoration-reconstruction deep neural network for down-sampling-based video coding. IEEE Trans. Image Process. 2021, 30, 1702–1715. [Google Scholar] [CrossRef]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Vonference on Vomputer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
- Sullivan, G.J.; Ohm, J.R.; Han, W.J.; Wiegand, T. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1649–1668. [Google Scholar] [CrossRef]
- Dong, C.; Loy, C.C.; Tang, X. Accelerating the super-resolution convolutional neural network. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part II 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 391–407. [Google Scholar]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
- Ranjan, A.; Black, M.J. Optical flow estimation using a spatial pyramid network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4161–4170. [Google Scholar]
- Newmarch, J.; Newmarch, J. Ffmpeg/libav. Linux Sound Program. 2017, 19, 227–234. [Google Scholar]
- Bossen, F. Common test conditions and software reference configurations. JCTVC-L1100 2013, 12, 1. [Google Scholar]
- Mercat, A.; Viitanen, M.; Vanne, J. UVG dataset: 50/120fps 4K sequences for video codec analysis and development. In Proceedings of the 11th ACM Multimedia Systems Conference, Istanbul, Turkey, 8–11 June 2020; pp. 297–302. [Google Scholar]
- Wang, H.; Gan, W.; Hu, S.; Lin, J.Y.; Jin, L.; Song, L.; Wang, P.; Katsavounidis, I.; Aaron, A.; Kuo, C.C.J. MCL-JCV: A JND-based H. 264/AVC video quality assessment dataset. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 1509–1513. [Google Scholar]
- Grois, D.; Marpe, D.; Mulayoff, A.; Itzhaky, B.; Hadar, O. Performance comparison of h. 265/mpeg-hevc, vp9, and h. 264/mpeg-avc encoders. In Proceedings of the 2013 Picture Coding Symposium (PCS), San Jose, CA, USA, 8–11 December 2013; pp. 394–397. [Google Scholar]
- Imambi, S.; Prakash, K.B.; Kanagachidambaresan, G. PyTorch. In Programming with TensorFlow: Solution for Edge Computing Applications; Springer: Berlin/Heidelberg, Germany, 2021; pp. 87–104. [Google Scholar]
- Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H.; Shao, L. Learning Enriched Features for Real Image Restoration and Enhancement. arXiv 2020, arXiv:2003.06792v2. [Google Scholar]
Symbols | Explanation |
---|---|
Input video frame t | |
Video restoration output | |
Video reconstruction output | |
The feature of frame t | |
Motion vector | |
O | The offset of the deformable convolution kernel |
The aligned feature | |
The output feature of MGAM for frame t | |
Convolutional layer with kernel of | |
Softmax normalized activation function | |
Vertical edge detection operator of Sobel | |
Horizontal edge detection operator of Sobel | |
Laplacian edge detection operator | |
⨀ | Element-wise product |
⨁ | Element-wise add |
Low-resolution Ground Truth at frame t | |
High-resolution Ground Truth at frame t | |
Mean squared error loss function |
Approach | Multi-Frame Utilization | Edge Preservation |
---|---|---|
RR-DnCNN v2 | × | × |
FM-VSR | ✓ | × |
EOCVSR (proposed) | ✓ | ✓ |
RR-DnCNN v2 | FM-VSR | EDVR | BasicVSR++ | EOCVSR | |
---|---|---|---|---|---|
Parameter Number | 1.8M | 7.1M | 2.7M | 7.1M | 3.5M |
GFLOPs | 20.9 | 96.1 | 66.6 | 104.3 | 88.4 |
QP | Class | Sequences | RR-DnCNN v2 | FM-VSR | EDVR | BasicVSR++ | EOCVSR |
---|---|---|---|---|---|---|---|
32 | A | PeopleOnStreet | 29.608 | 29.813 | 28.859 | 30.029 | 30.129 |
Traffic | 32.817 | 32.866 | 32.976 | 33.069 | 33.102 | ||
B | BasketballDrive | 31.726 | 31.902 | 32.422 | 32.471 | 32.531 | |
Cactus | 31.012 | 31.297 | 31.566 | 31.712 | 31.735 | ||
Kimono | 34.456 | 34.735 | 34.799 | 34.844 | 34.876 | ||
ParkScene | 31.229 | 31.384 | 31.608 | 31.655 | 31.671 | ||
C | BasketballDrill | 30.177 | 30.379 | 30.460 | 30.636 | 30.706 | |
RaceHorses | 24.544 | 26.111 | 27.524 | 27.599 | 27.609 | ||
E | FourPeople | 33.757 | 33.865 | 33.929 | 34.103 | 34.260 | |
Johnny | 35.705 | 35.926 | 36.101 | 36.137 | 36.162 | ||
KristenAndSara | 34.282 | 34.510 | 34.660 | 34.919 | 34.973 | ||
Average | 32.017 | 32.072 | 32.348 | 32.470 | 32.523 | ||
37 | Average | 29.962 | 30.093 | 30.137 | 30.210 | 30.262 | |
42 | Average | 27.559 | 27.610 | 27.775 | 27.851 | 27.838 | |
47 | Average | 25.308 | 25.366 | 25.459 | 25.464 | 25.524 | |
Overall | 28.712 | 28.785 | 28.930 | 28.999 | 29.037 |
Dataset | QP | RR-DnCNN v2 | FM-VSR | EDVR | BasicVSR++ | EOCVSR |
---|---|---|---|---|---|---|
UVG | 32 | 35.352 | 35.358 | 35.450 | 35.526 | 35.558 |
37 | 33.142 | 33.167 | 33.151 | 33.237 | 33.250 | |
42 | 30.683 | 30.701 | 30.755 | 30.822 | 30.822 | |
47 | 28.267 | 28.243 | 28.326 | 28.331 | 28.375 | |
Overall | 31.861 | 31.867 | 31.920 | 31.979 | 32.001 | |
MCL-JCV | 32 | 34.860 | 34.935 | 35.053 | 35.140 | 35.172 |
37 | 32.748 | 32.749 | 32.774 | 32.890 | 32.930 | |
42 | 30.426 | 30.480 | 30.500 | 30.575 | 30.568 | |
47 | 28.222 | 28.304 | 28.313 | 28.328 | 28.386 | |
Overall | 31.564 | 31.617 | 31.660 | 31.733 | 31.764 |
Dataset | Class | Sequences | RR-DnCNN v2 | FM-VSR | EDVR | BasicVSR++ | EOCVSR |
---|---|---|---|---|---|---|---|
HEVC | A | PeopleOnStreet | −11.621 | −13.780 | −15.541 | −18.113 | −20.621 |
Traffic | −3.916 | −5.019 | −6.153 | −7.833 | −8.584 | ||
B | BasketballDrive | 0.687 | −7.133 | −11.365 | −14.361 | −14.140 | |
Cactus | −0.224 | −9.965 | −12.462 | −14.842 | −15.386 | ||
Kimono | −13.743 | −17.421 | −20.907 | −22.182 | −22.501 | ||
ParkScene | 5.323 | −4.408 | −7.121 | −8.035 | −8.452 | ||
C | BasketballDrill | 0.307 | −3.147 | −4.883 | −6.685 | −8.300 | |
RaceHorses | −6.599 | −7.366 | −8.185 | −10.610 | −11.420 | ||
E | FourPeople | −11.528 | −12.154 | −12.788 | −13.591 | −14.836 | |
Johnny | −19.018 | −19.580 | −20.076 | −21.614 | −22.046 | ||
KristenAndSara | −7.181 | −7.877 | −8.414 | −10.864 | −11.721 | ||
Average | −6.137 | −9.804 | −11.627 | −13.521 | −14.364 | ||
UVG | Average | −27.086 | −27.313 | −27.804 | −29.502 | −29.849 | |
MCL-JCV | Average | −24.066 | −24.526 | −24.913 | −27.599 | −28.217 |
EORB w/o Edge-Perceiving Filters | EORB w/ Edge-Perceiving Filters | |
---|---|---|
PSNR (dB) | 28.317 | 28.386 |
Noise Level | Scale | MIRNet | EOCVSR |
---|---|---|---|
34.65 | 35.10 | ||
33.86 | 34.45 | ||
31.06 | 32.46 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Z.; Quan, G.; He, G. Edge-Oriented Compressed Video Super-Resolution. Sensors 2024, 24, 170. https://doi.org/10.3390/s24010170
Wang Z, Quan G, He G. Edge-Oriented Compressed Video Super-Resolution. Sensors. 2024; 24(1):170. https://doi.org/10.3390/s24010170
Chicago/Turabian StyleWang, Zheng, Guancheng Quan, and Gang He. 2024. "Edge-Oriented Compressed Video Super-Resolution" Sensors 24, no. 1: 170. https://doi.org/10.3390/s24010170