Article Search
닫기

## Original Article

Split Viewer

International Journal of Fuzzy Logic and Intelligent Systems 2021; 21(3): 222-232

Published online September 25, 2021

https://doi.org/10.5391/IJFIS.2021.21.3.222

© The Korean Institute of Intelligent Systems

## A Restoration Method of Single Image Super Resolution Using Improved Residual Learning with Squeeze and Excitation Blocks

Wang-Su Jeon1 and Sang-Yong Rhee2

1Department of IT Convergence Engineering, Kyungnam University, Changwon, Korea
2Department of Computer Engineering, Kyungnam University, Changwon, Korea

Correspondence to :
Sang-Yong Rhee (syrhee@kyungnam.ac.kr)

Received: November 5, 2020; Revised: September 13, 2021; Accepted: September 16, 2021

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Techniques for single-image super-resolution have been developed through deep learning. In this paper, we propose a method using advanced residual learning and squeeze and excitation (SE) blocks for such resolution. Improving the residual learning increases the similarity between pixels by adding one skip to the existing residual, and it is possible to improve the performance while slightly increasing the number of calculations by applying the SE block of SENet. The performance evaluation was tested as part of the super-resolution generative adversarial network (SRGAN) and using three proposed modules, and the effect of the residual and the SE blocks on the super-resolution and the change in performance was confirmed. Although the results vary slightly from image, SE and residual blocks have been found to help improve the performance and are best used with blocks.

Keywords: CNN, Super-resolution, SRGAN, Residual learning, Squeeze and excitation

As living standards improve, demand for high-quality products in all fields, including video images, is increasing. High-quality video means a high resolution and good color. Compression technology is essential for the efficient transmission of high-resolution image data, and a method of restoring information lost during the compression process is required. In addition, medical images, CCTV, and thermal imaging, among other imaging types, do not have a high resolution because of the hardware limitations, and it is difficult to provide an image quality desired by users. An ultra-high-resolution restoration is therefore required to provide a good quality image. Single image super-resolution (SISR) is a technique for restoring a down-sampled image to an original super-resolution image.

Ultra-high super-resolution has made significant progress through recent deep learning technologies [14]. However, most studies have focused only on improving the performance of the complete image [5,6], and few studies have considered the image texture [7]. In particular, when an image is simply divided into high- and low-frequency regions and compressed, it is difficult to restore the image because the loss in the high-frequency region is large. In addition, the higher the frequency, the more important the information is, and it can be stated that restoring a high-frequency region well is more important than improving the overall performance.

We use a super-resolution generative adversarial network (SRGAN) [7] as the base model, and because SENet using squeeze and excitation (SE) blocks is superior to ResNet using residual blocks, we change the residual modules of SRGAN into a module network, SB blocks, or a combination of the two to improve the restoration performance of the texture area [4]. The remainder of this paper is organized as follows. Section 2 describes existing studies on high-resolution restoration methods. We propose the three models in Section 3. Section 4 describes the experimental results and an analysis of the proposed methods, and finally, Section 5 presents some concluding remarks and areas of future research.

One way to obtain a high-resolution image is to physically increase the number of pixels in the image sensor. However, this increases the size of the image sensor, which is not economical and makes it difficult to handle. There is another way to reduce the size of a single pixel to fit many pixels in a certain area. However, if the size of the pixel is restored as much as it is reduced, the quality of the final image deteriorates [8].

Therefore, research on the restoration of ultra-high-resolution images using software has been developed, and active research in this area is still ongoing [9]. Ultra-high-resolution image restoration can be broadly divided into multi-frame-based and single-image-based research. In the case of multi-frame-based research [10, 11], it is possible to restore a super-resolution image by complementing the information acquired for each frame; however, a low-resolution image may not be able to be restored, or the quality of the restored result may degrade owing to insufficient information. In the early stage of SISR image restoration studies, although research based on linear, bicubic, and Lanczon filters with a high execution speed were conducted [12], the resulting images were slightly blurred. To overcome these shortcomings, research on preserving the edges has been introduced [9].

With the recent introduction of deep learning technology, ultra-high-resolution images have been successfully restored. The relationship between high and low resolutions has been discovered in the same image. In particular, as shown in Figure 1, a convolutional neural network CNN [9], a very deep super-resolution (VDSR) [10], and a road structure refined CNN (RSRCNN) [13], and a deeply-recursive convolutional network (DRCN) [14], which are based on CNN [1517], performed better than existing networks. The difference between a low-resolution image obtained using an existing interpolation method and a high-resolution image obtained from these models is shown in Figure 2 [9].

However, CNN-based algorithms using the mean squared error (MSE) based loss functions are still insufficient in representing the textures of ultra-high-resolution images. To address this issue, it is applied to restoring ultra-high-resolution images using SRGAN. An SRGAN is a super-resolution image reconstruction method based on a generative adversarial network [18].

### 2.1 Super Resolution GAN

Similar to other GAN models, a SRGAN consists of a generative model and a discriminative model. The SRGAN generates fake images similar to the real image from random noise in the generative model, and simultaneously learns whether the image generated in the discriminative model is real or fake through the process of discrimination. The discriminant model at the beginning of the learning easily determines the authenticity of the image generated by the generative model; however, as the learning progresses, the generative model generates an image that the discriminant model cannot distinguish.

As such, two neural networks with different purposes can be used to construct a system that generates an image similar to the real one. Existing super-simple image resolution techniques have difficulty in restoring texture, which provides detailed information about high-frequency components.

However, an SRGAN provides a solution to this issue. The structure of the SRGAN used is shown in Figure 3. To increase the accuracy, the network consists of a deep model, batch normalization, and skip connection and is trained using perceptual loss. Perceptual loss is the weighted sum of the content and adversarial losses. The VGG loss function is used as the content loss to improve the image texture expression. The core idea of the VGG loss is to find the difference between the image generated by the generator and the original image, not from the resulting images, but after passing through the feature map.

The goal of the proposed system is to learn the texture information. Because most texture regions are high-frequency signals based on edges, the high-frequency information is approximated through residual learning. In this section, a brief description of a residual network and SENet is provided. In addition, the structure of the proposed method for increasing the effective performance through residual learning is described.

### 3.1 Residual Network

ResNet [19] was developed to solve the problem in which the deeper the neural network depth is, the more difficult the learning becomes. Therefore, by learning the residuals, deep networks can be learned more easily than with GoogleNet [20, 21] and VGGNet [22]. In addition, it is easy to optimize the network, and a high accuracy is obtained through a deep model [2325].

After learning and evaluating ImageNet using ResNet, the result won the ILSVRC 2015 challenge with a very small error of 3.57%, and a good performance was demonstrated in the detection and segmentation of the CIFAR-10 and COCO datasets. Residual learning can be considered an ensemble approach in which the identity mapping between layers and weights learned from shallow models is combined. This is also similar to adjusting the reflection ratio of the old and new memories in long short-term memory. Residual learning can be seen as approximating information between pixels, which improves the restoration performance [26, 27].

### 3.2 Squeeze and Excitation Block

In this study, the SE block, which is applied in SENet [28], was used to improve the performance. The SE block can be applied to any existing model, as shown in Figure 4. If the SE block shown in Figure 3 is added to VGGNet, GoogLeNet, and ResNet, among others, the performance will increase. Although the performance is greatly improved, the number of parameters does not increase significantly, and thus the increase in the number of computations is not large.

In general, to improve the performance, the number of computations increases by that amount. In the case of SENet, use of the SE block is efficient because the accuracy can be improved without significantly increasing the number of computations.

### 3.3 Proposed Module

The structure of the system module suggested in this study is shown in Figure 5. We applied the residual and SE blocks separately to the SRGAN and verified how the network affects the super-resolution by applying it concurrently. Figure 5(a) shows the conventional ResNet, and Figure 5(b) shows ResNet with the addition of an SE block. As mentioned earlier, this aims to maximize the texture information and minimize the smoothness. Figure 5(c) shows the addition of the term of the residual block to Figure 5(a) for use in learning when considering the characteristics at different high frequencies. Figure 5(d) shows the addition of the SE block to the ResNet shown in Figure 5(c).

### 4.1 Experiment Details

The experimental environment used in this study was as follows. The hardware consists of an Intel Xeon Gold 5120 CPU, 64 GB of RAM, and GPGPU is the RTX TITAN X of 24 GB. The deep learning framework is TensorFlow 1.14.0 in Ubuntu 18.04.4, and the dataset used for training is DIV2K 2017 [29]. The DIV2K data consist of high-resolution images with a pixel resolution of 2044×1404 and low-resolution images corresponding to each image. And the training set consists of 800 images, and the test set consists of 100 images.

To form the model, we apply 200 epochs and use the Adam optimizer with a learning rate of 0.0001 and a momentum of 0.9. This model has a batch size of 16. To be used for learning, the generator restores the reduced images (96 × 96 in size) to a pixel resolution of 384 × 384, and the discriminator determines the similarity by inputting the output of the generator and the high-resolution image, executing a binary classification to classify it into low- and high-resolution images, with learning progressing as a single network connecting the generator and discriminator. In the discriminator, the original high-resolution image and the resulting super-resolution image are compared and evaluated, and learning continues. During the learning, the sigmoid function included at the end of the discriminator is removed, and the loss function is calculated. The final loss value is calculated by adding the contradictory loss value calculated in the discriminator to the content loss value generated by the independent VGG at a given ratio. The builder’s formula and the SRGAN identification formula are shown in Equations (1) and (2) [7].

θG˜=argmin1NθGlSR(GθG(InLR),InHR)minθGmaxθDEIHR~Ptrain(IHR)[logDθD(IHR)]+EILR~PG(ILR)[log(1-DθD(GθG(IHR)))].

In the above equation, G is a generator receiving a low-resolution image input and therefore generates a super-resolution image. In addition, θG is the value of the G parameter, and the generated image is the value of the loss function determined by comparing it to the high-resolution image. Moreover, D is the generated image, which is a high-resolution image or a generated image, and θG denotes the augmentation of D.

To quantitatively measure the image quality, we used the peak signal-to-noise ratio (PSNR), which evaluates the image quality loss information; the structural similarity (SSIM); the visual information fidelity (VIF), which evaluates the accuracy of the visual information; and the relative polar edge coherence (RECO), which evaluates the similarity using polar coordinates, as in Equations (3)(6) [30].

In Equation (3), MAXI is the maximum pixel value of the image. In the case of an 8-bit gray image, it was 255. The higher the MSE is, the better the video quality.

The SSIM index is obtained by multiplying the brightness comparison, contrast comparison, and structure comparison values of the actual and reconstructed images. Because the human visual system is specialized in deriving the structural information of an image, as the key hypothesis, if the structural information is distorted, the perception will be significantly affected. In Equaton (4), μi is the average brightness, and σi is the variance of image i. In addition, Ci is a stabilization constant that supports a situation in which the discriminant has a value of close to zero. Moreover, (2μxμy+C1)(μx2+μy2+C1) is an equation comparing the brightness, and (2σxσy+C2)(σx2+σy2+C2) is the contrast comparison. In addition, (σxy+C3)(σxσy+C3) compares the structures of the two images.

VIF calculates the mutual information between C, which is the original image, and E, which is an image recognized by a human. That is the entropy they share. It also calculates the amount of mutual information between D, which is a distorted image, and F, which is recognized as a distorted image. The quality of the distorted image is then predicted by calculating the ratio between the two, as shown in Equation (5).

The ECO metric is an absolute measure, and thus it might be used in principle for no-Reference quality estimation, provided that a quality unit is defined. However, ECO suffers from the fact that it depends on the image content. To compensate for this, the RECO index, the ratio between the images I(x1,x2) and a reference image Ī(x1,x1) of the same scene, is defined through Equation (6).

PSNR=20log10(MAXI)-10log10(MSE),SSIM(x,y)=(2μxμy+C1)(2σxσy+C2)(σxy+C3)(μx2+μy2+C1)(σx2+σy2+C2)(σxσy+C3),VIF=iSubbandsI(CN,j;FN,jSN,j)iSubbandsI(CN,j;EN,jSN,j),RECO(σ)=ECO(σ)+CEC˜O(σ)+C.

### 4.2 Experiment Result

Table 1 shows the performance evaluation for each model, and as a result of the experiment, it can be confirmed that the residual and SE blocks are helpful in improving the performance. In particular, when using the residual and SE blocks, the performance can still be improved. As shown in Table 2, it was possible to see an increase in the operating speed of 1.1- to 2-fold compared to the existing SRGAN; however, the performance was improved, and the performance changed owing to the addition of blocks.

Thereafter, six images were tested with the proposed model, as shown in Table 2. In Table 2, Image #0 is the result of measuring the image quality of the penguin in Figure 6. In the case of Figure 5, the VIF obtained by the proposed method is shown to be the best, and it can be seen that the beak of the penguin in Figure 6 is well restored. In addition, #45 indicates the image in Figure 7, where it can be seen that the proposed the method is similar to the real image and has the smallest grid pattern.

Image #47 is shown in Figure 8, and in the case of the improved residual, a grating pattern is found, although not when applying the proposed method; in addition, the performance evaluation is also high. In the case of #62, as shown in Figure 9, the proposed method achieves the best results for pattern restoration of the hair, and the performance is also high.

In the case of #69, as shown in Figure 10, the performance was better than the proposed method when the residual improved because the pattern of hair was constant. For #82, as shown in Figure 11, the proposed method achieves the best quality evaluation, and the texture restoration is also the highest. Therefore, when using a residual block with an SE block, it was confirmed that the removal of the grid pattern results in a higher restoring force based on the similarity of the pixels.

Super-resolution image restoration is a major issue in the modern era of the storage and exchanging of image data. In this study, an SRGAN, which has contributed to this area, and three proposed models were tested using the DIV2K dataset to achieve an ultra-high-resolution restoration using a single image. A performance evaluation was conducted for PSNR, SSIM, VIF, and RECO. For the entire dataset, it depends on the model, but on average, 0.01% of VIF and SSIM and 1% of PSNR were better. Six images showed differences in the image quality according to the numerical results. However, RECO achieved worse results than the base model. With this, we know that the SE block improves the performance. The SRGAN+Residual+SE model is said to be the best because it takes a longer time than the other approaches. Therefore, the SRGAN+SE or SRGAN+Residual models can be used. However, there was a problem in that a smoothing phenomenon occurred again when the layer was deepened.

In future research, we will consider using SRGAN+SE or a combined SRGAN+Residual Markov random field to the smoothing problem for improving the restoration performance and restoring the resolution of the depth image.

This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. 2020R1F1A107596811).

### Conflict of Interest

Fig. 1.

Type of CNN based super resolution architecture: (a) SR-CNN [9], (b) RSNCNN [13], (c) VDSR [10], and (d) DRCN [14].

Fig. 2.

Comparison of SR and bicubic image.

Fig. 3.

SRGAN overview [7].

Fig. 4.

SE block structure. Fsq: A channel descriptor is created by aggregating the feature maps across the spatial dimension (H×W). This descriptor embeds the global distribution (information) of the channel-wise feature response, and thus information that can be obtained from the global receptive field can be used in all layers of the network. Fex: Simple self-gating is applied to generate the per-channel modulation using weight embedding as the input. C1: Custom module output. C2: SE block inputs and outputs. xin, xout, W, H, and C represent the input, output, width, height, and channel, respectively.

Fig. 5.

Proposed architecture: (a) baseline, (b) SRGAN+SE (c), SRGAN+Residual and (d)SRGAN+Residual+SE.

Fig. 6.

Image #0 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.

Fig. 7.

Image #45 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.

Fig. 8.

Image #47 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.

Fig. 9.

Image #62 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.

Fig. 10.

Image #69 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE

Fig. 11.

Image #82 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.

Table. 1.

Table 1. Image quality measurement from model of DIV2K 2017 dataset (unit: dB).

SRGANSRGAN +SESRGAN +ResidualSRGAN +Residual+SE
VIF0.371720.380360.3848590.388321
SSIM0.7702720.781850.7767310.783999
PSNR25.2097626.6898326.0503426.33148
RECO0.8253080.7826260.8041530.775451

Table. 2.

Table 2. Image quality measurement from DIV2K 2017 dataset.

SRGANSRGAN +SESRGAN +ResidualSRGAN +Residual+SE
Time (s)1.85642.14822.52963.9106

Image #0
VIF0.4281650.4541840.4552080.457381
SSIM0.7476100.7798860.7851170.782060
PSNR26.39637127.51896727.35776027.436177
RECO0.8542010.7261980.7788290.707260

Image #45
VIF0.2554550.2657570.2693510.270637
SSIM0.6874840.7044390.6977040.709990
PSNR21.86147222.30850522.37628122.294032
RECO0.7735650.8149060.7861660.780359

Image #47
VIF0.3740790.3898110.3937890.401365
SSIM0.7482300.7630340.7272750.763076
PSNR25.41145725.99912925.85330126.207819
RECO0.8631570.8591680.8749120.836694

Image #62
VIF0.2466990.2375770.2496780.252927
SSIM0.7724320.7679370.7737430.781695
PSNR26.34277029.91251827.48657527.825911
RECO0.8274250.7436270.7881960.836395

Image #69
VIF0.3213730.3257260.3289150.320290
SSIM0.7399720.7451780.7467680.734274
PSNR22.13970822.58686022.49036822.105132
RECO0.6587060.6394480.6461930.639967

Image #82
VIF0.6045460.6091050.6122100.627323
SSIM0.9259060.9306250.9297790.932901
PSNR29.10680331.81300230.73776932.119837
RECO0.9747950.9124060.9506200.852030

1. Anwar, S, Khan, S, and Barnes, N (2020). A deep journey into super-resolution: a survey. ACM Computing Surveys (CSUR). 53, 1-34. https://doi.org/10.1145/3390462
2. Bae, W, Yoo, J, and Ye, JC . Beyond deep residual learning for image restoration: persistent homology-guided manifold simplification., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, Honolulu, HI, pp.1141-1149.
3. Yang, W, Zhang, X, Tian, Y, Wang, W, Xue, JH, and Liao, Q (2019). Deep learning for single image super-resolution: a brief review. IEEE Transactions on Multimedia. 21, 3106-3121. https://doi.org/10.1109/TMM.2019.2919431
4. Gu, J, Sun, X, Zhang, Y, Fu, K, and Wang, L (2019). Deep residual squeeze and excitation network for remote sensing image super-resolution. Remote Sensing. 11. article no. 1817
5. Zhang, Z, Wang, Z, Lin, Z, and Qi, H . Image super-resolution by neural texture transfer., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, Long Beach, CA, pp.7982-7991.
6. Wang, X, Yu, K, Dong, C, and Loy, CC . Recovering realistic texture in image super-resolution by deep spatial feature transform., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, Salt Lake City, UT, pp.606-615.
7. Ledig, C, Theis, L, Huszar, F, Caballero, J, Cunningham, A, and Acosta, A. (2017) . Photo-realistic single image super-resolution using a generative adversarial network. Available: https://arxiv.org/abs/1609.04802v5
8. Yue, L, Shen, H, Li, J, Yuan, Q, Zhang, H, and Zhang, L (2016). Image super-resolution: the techniques, applications, and future. Signal Processing. 128, 389-408. https://doi.org/10.1016/j.sigpro.2016.05.002
9. Dong, C, Loy, CC, He, K, and Tang, X (2014). Learning a deep convolutional network for image super-resolution. Computer Vision - ECCV 2014. Cham, Switzerland: Springer, pp. 184-199 https://doi.org/10.1007/978-3-319-10593-2_13
10. Kim, J, Lee, JK, and Lee, KM . Accurate image super-resolution using very deep convolutional networks., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, Las Vegas, NV, pp.1646-1654.
11. Lim, B, Son, S, Kim, H, Nah, S, and Lee, KM . Enhanced deep residual networks for single image super-resolution., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, Honolulu, HI, pp.1132-1140.
12. Irani, M, and Peleg, S (1991). Improving resolution by image registration. CVGIP: Graphical Models and Image Processing. 53, 231-239. https://doi.org/10.1016/1049-9652(91)90045-L
13. Wei, Y, Wang, Z, and Xu, M (2017). Road structure refined CNN for road extraction in aerial image. IEEE Geoscience and Remote Sensing Letters. 14, 709-713. https://doi.org/10.1109/LGRS.2017.2672734
14. Kim, J, Lee, JK, and Lee, KM . Deeply-recursive convolutional network for image super-resolution., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, Las Vegas, NV, pp.1637-1645.
15. Hajarolasvadi, N, and Demirel, H (2019). 3D CNN-based speech emotion recognition using k-means clustering and spectrograms. Entropy. 21. article no. 479
16. Lee, WY, Ko, KE, Geem, ZW, and Sim, KB (2017). Method that determining the hyperparameter of CNN using HS algorithm. Journal of Korean Institute of Intelligent Systems. 27, 22-28. https://doi.org/10.5391/JKIIS.2017.27.1.022
17. Kim, S, and Cho, Y (2020). An artificial intelligence Othello game agent using CNN based MCTS and reinforcement learning. Journal of Korean Institute of Intelligent Systems. 30, 40-46. https://doi.org/10.5391/JKIIS.2020.30.1.40
18. Goodfellow, I, Pouget-Abadie, J, Mirza, M, Xu, B, Warde-Farley, D, Ozair, S, Courville, AC, and Bengio, Y (2014). Generative adversarial nets. Advances in Neural Information Processing Systems. 27, 2672-2680.
19. He, K, Zhang, X, Ren, S, and Sun, J . Deep residual learning for image recognition., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, Las Vegas, NV, pp.770-778.
20. Szegedy, C, Liu, W, Jia, Y, Sermanet, P, Reed, SE, Anguelov, D, Erhan, D, Vanhoucke, V, and Rabinovich, A . Going deeper with convolutions., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, Boston, MA, pp.1-9.
21. Szegedy, C, Vanhoucke, V, Ioffe, S, Shlens, J, and Wojna, Z. (2015) . Rethinking the inception architecture for computer vision. Available: https://arxiv.org/abs/1512.00567
22. Simonyan, K, and Zisserman, A. (2015) . Very deep convolutional networks for large-scale image recognition. Available: https://arxiv.org/abs/1409.1556
23. Szegedy, C, Ioffe, S, Vanhoucke, V, and Alemi, AA. (2016) . Inception-v4, Inception-ResNet and the impact of residual connections on learning. Available: https://arxiv.org/abs/1602.07261
24. He, K, Zhang, X, Ren, S, and Sun, J (2016). Identity mappings in deep residual networks. Computer Vision - ECCV 2016. Cham, Switzerland: Springer, pp. 630-645 Springer, Cham. https://doi.org/10.1007/978-3-319-46493-0_38
25. Jiao, J, Tu, WC, He, S, and Lau, RW . FormResNet: formatted residual learning for image restoration., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, Honolulu, HI, pp.1034-1042.
26. Hou, J, Si, Y, and Yu, X (). A novel and effective image super-resolution reconstruction technique via fast global and local residual learning model. Applied Sciences. 10, 2020. article no. 1856
27. Hartmann, W, Galliani, S, Havlena, M, Van Gool, L, and Schindler, K . Learned multi-patch similarity., Proceedings of the IEEE International Conference on Computer Vision, 2017, Venice, Italy, pp.1595-1603.
28. Hu, J, Shen, L, and Sun, G . Squeeze-and-excitation networks., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, Salt Lake City, UT, pp.7132-7141.
29. Agustsson, E, and Timofte, R . NTIRE 2017 challenge on single image super-resolution: dataset and study., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, Honolulu, HI, pp.1122-1131.
30. Zhai, G, and Min, X (). Perceptual image quality assessment: a survey. Science China Information Sciences. 63, 2020. article no. 211301

Wang-Su Jeon received his B.S. and M.S. degrees in computer engineering and IT convergence engineering from Kyungnam University, Masan, Korea, in 2016 and 2018, respectively, and is currently pursuing a Ph.D. in IT convergence engineering at Kyungnam University. His present interests include computer vision, pattern recognition, and machine learning

E-mail: jws2218@naver.com

Sang-Yong Rhee received his B.S. and M.S. degrees in industrial engineering from Korea University, Seoul, Korea, in 1982 and 1984, respectively, and his Ph.D. in industrial engineering at Pohang University, Pohang, Korea. He is currently a professor in computer engineering, Kyungnam University, Masan, Korea. His research interests include computer vision, augmented reality, neuro-fuzzy, and human-robot interfaces.

E-mail: syrhee@kyungnam.ac.kr

### Article

#### Original Article

International Journal of Fuzzy Logic and Intelligent Systems 2021; 21(3): 222-232

Published online September 25, 2021 https://doi.org/10.5391/IJFIS.2021.21.3.222

## A Restoration Method of Single Image Super Resolution Using Improved Residual Learning with Squeeze and Excitation Blocks

Wang-Su Jeon1 and Sang-Yong Rhee2

1Department of IT Convergence Engineering, Kyungnam University, Changwon, Korea
2Department of Computer Engineering, Kyungnam University, Changwon, Korea

Correspondence to:Sang-Yong Rhee (syrhee@kyungnam.ac.kr)

Received: November 5, 2020; Revised: September 13, 2021; Accepted: September 16, 2021

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

### Abstract

Techniques for single-image super-resolution have been developed through deep learning. In this paper, we propose a method using advanced residual learning and squeeze and excitation (SE) blocks for such resolution. Improving the residual learning increases the similarity between pixels by adding one skip to the existing residual, and it is possible to improve the performance while slightly increasing the number of calculations by applying the SE block of SENet. The performance evaluation was tested as part of the super-resolution generative adversarial network (SRGAN) and using three proposed modules, and the effect of the residual and the SE blocks on the super-resolution and the change in performance was confirmed. Although the results vary slightly from image, SE and residual blocks have been found to help improve the performance and are best used with blocks.

Keywords: CNN, Super-resolution, SRGAN, Residual learning, Squeeze and excitation

### 1. Introduction

As living standards improve, demand for high-quality products in all fields, including video images, is increasing. High-quality video means a high resolution and good color. Compression technology is essential for the efficient transmission of high-resolution image data, and a method of restoring information lost during the compression process is required. In addition, medical images, CCTV, and thermal imaging, among other imaging types, do not have a high resolution because of the hardware limitations, and it is difficult to provide an image quality desired by users. An ultra-high-resolution restoration is therefore required to provide a good quality image. Single image super-resolution (SISR) is a technique for restoring a down-sampled image to an original super-resolution image.

Ultra-high super-resolution has made significant progress through recent deep learning technologies [14]. However, most studies have focused only on improving the performance of the complete image [5,6], and few studies have considered the image texture [7]. In particular, when an image is simply divided into high- and low-frequency regions and compressed, it is difficult to restore the image because the loss in the high-frequency region is large. In addition, the higher the frequency, the more important the information is, and it can be stated that restoring a high-frequency region well is more important than improving the overall performance.

We use a super-resolution generative adversarial network (SRGAN) [7] as the base model, and because SENet using squeeze and excitation (SE) blocks is superior to ResNet using residual blocks, we change the residual modules of SRGAN into a module network, SB blocks, or a combination of the two to improve the restoration performance of the texture area [4]. The remainder of this paper is organized as follows. Section 2 describes existing studies on high-resolution restoration methods. We propose the three models in Section 3. Section 4 describes the experimental results and an analysis of the proposed methods, and finally, Section 5 presents some concluding remarks and areas of future research.

### 2. Related Work

One way to obtain a high-resolution image is to physically increase the number of pixels in the image sensor. However, this increases the size of the image sensor, which is not economical and makes it difficult to handle. There is another way to reduce the size of a single pixel to fit many pixels in a certain area. However, if the size of the pixel is restored as much as it is reduced, the quality of the final image deteriorates [8].

Therefore, research on the restoration of ultra-high-resolution images using software has been developed, and active research in this area is still ongoing [9]. Ultra-high-resolution image restoration can be broadly divided into multi-frame-based and single-image-based research. In the case of multi-frame-based research [10, 11], it is possible to restore a super-resolution image by complementing the information acquired for each frame; however, a low-resolution image may not be able to be restored, or the quality of the restored result may degrade owing to insufficient information. In the early stage of SISR image restoration studies, although research based on linear, bicubic, and Lanczon filters with a high execution speed were conducted [12], the resulting images were slightly blurred. To overcome these shortcomings, research on preserving the edges has been introduced [9].

With the recent introduction of deep learning technology, ultra-high-resolution images have been successfully restored. The relationship between high and low resolutions has been discovered in the same image. In particular, as shown in Figure 1, a convolutional neural network CNN [9], a very deep super-resolution (VDSR) [10], and a road structure refined CNN (RSRCNN) [13], and a deeply-recursive convolutional network (DRCN) [14], which are based on CNN [1517], performed better than existing networks. The difference between a low-resolution image obtained using an existing interpolation method and a high-resolution image obtained from these models is shown in Figure 2 [9].

However, CNN-based algorithms using the mean squared error (MSE) based loss functions are still insufficient in representing the textures of ultra-high-resolution images. To address this issue, it is applied to restoring ultra-high-resolution images using SRGAN. An SRGAN is a super-resolution image reconstruction method based on a generative adversarial network [18].

### 2.1 Super Resolution GAN

Similar to other GAN models, a SRGAN consists of a generative model and a discriminative model. The SRGAN generates fake images similar to the real image from random noise in the generative model, and simultaneously learns whether the image generated in the discriminative model is real or fake through the process of discrimination. The discriminant model at the beginning of the learning easily determines the authenticity of the image generated by the generative model; however, as the learning progresses, the generative model generates an image that the discriminant model cannot distinguish.

As such, two neural networks with different purposes can be used to construct a system that generates an image similar to the real one. Existing super-simple image resolution techniques have difficulty in restoring texture, which provides detailed information about high-frequency components.

However, an SRGAN provides a solution to this issue. The structure of the SRGAN used is shown in Figure 3. To increase the accuracy, the network consists of a deep model, batch normalization, and skip connection and is trained using perceptual loss. Perceptual loss is the weighted sum of the content and adversarial losses. The VGG loss function is used as the content loss to improve the image texture expression. The core idea of the VGG loss is to find the difference between the image generated by the generator and the original image, not from the resulting images, but after passing through the feature map.

### 3. Proposed System

The goal of the proposed system is to learn the texture information. Because most texture regions are high-frequency signals based on edges, the high-frequency information is approximated through residual learning. In this section, a brief description of a residual network and SENet is provided. In addition, the structure of the proposed method for increasing the effective performance through residual learning is described.

### 3.1 Residual Network

ResNet [19] was developed to solve the problem in which the deeper the neural network depth is, the more difficult the learning becomes. Therefore, by learning the residuals, deep networks can be learned more easily than with GoogleNet [20, 21] and VGGNet [22]. In addition, it is easy to optimize the network, and a high accuracy is obtained through a deep model [2325].

After learning and evaluating ImageNet using ResNet, the result won the ILSVRC 2015 challenge with a very small error of 3.57%, and a good performance was demonstrated in the detection and segmentation of the CIFAR-10 and COCO datasets. Residual learning can be considered an ensemble approach in which the identity mapping between layers and weights learned from shallow models is combined. This is also similar to adjusting the reflection ratio of the old and new memories in long short-term memory. Residual learning can be seen as approximating information between pixels, which improves the restoration performance [26, 27].

### 3.2 Squeeze and Excitation Block

In this study, the SE block, which is applied in SENet [28], was used to improve the performance. The SE block can be applied to any existing model, as shown in Figure 4. If the SE block shown in Figure 3 is added to VGGNet, GoogLeNet, and ResNet, among others, the performance will increase. Although the performance is greatly improved, the number of parameters does not increase significantly, and thus the increase in the number of computations is not large.

In general, to improve the performance, the number of computations increases by that amount. In the case of SENet, use of the SE block is efficient because the accuracy can be improved without significantly increasing the number of computations.

### 3.3 Proposed Module

The structure of the system module suggested in this study is shown in Figure 5. We applied the residual and SE blocks separately to the SRGAN and verified how the network affects the super-resolution by applying it concurrently. Figure 5(a) shows the conventional ResNet, and Figure 5(b) shows ResNet with the addition of an SE block. As mentioned earlier, this aims to maximize the texture information and minimize the smoothness. Figure 5(c) shows the addition of the term of the residual block to Figure 5(a) for use in learning when considering the characteristics at different high frequencies. Figure 5(d) shows the addition of the SE block to the ResNet shown in Figure 5(c).

### 4.1 Experiment Details

The experimental environment used in this study was as follows. The hardware consists of an Intel Xeon Gold 5120 CPU, 64 GB of RAM, and GPGPU is the RTX TITAN X of 24 GB. The deep learning framework is TensorFlow 1.14.0 in Ubuntu 18.04.4, and the dataset used for training is DIV2K 2017 [29]. The DIV2K data consist of high-resolution images with a pixel resolution of 2044×1404 and low-resolution images corresponding to each image. And the training set consists of 800 images, and the test set consists of 100 images.

To form the model, we apply 200 epochs and use the Adam optimizer with a learning rate of 0.0001 and a momentum of 0.9. This model has a batch size of 16. To be used for learning, the generator restores the reduced images (96 × 96 in size) to a pixel resolution of 384 × 384, and the discriminator determines the similarity by inputting the output of the generator and the high-resolution image, executing a binary classification to classify it into low- and high-resolution images, with learning progressing as a single network connecting the generator and discriminator. In the discriminator, the original high-resolution image and the resulting super-resolution image are compared and evaluated, and learning continues. During the learning, the sigmoid function included at the end of the discriminator is removed, and the loss function is calculated. The final loss value is calculated by adding the contradictory loss value calculated in the discriminator to the content loss value generated by the independent VGG at a given ratio. The builder’s formula and the SRGAN identification formula are shown in Equations (1) and (2) [7].

$θG˜=argmin1N∑θGlSR(GθG(InLR),InHR)$$minθG maxθD EIHR~Ptrain(IHR)[logDθD(IHR)]+EILR~PG(ILR)[log(1-DθD(GθG(IHR)))].$

In the above equation, G is a generator receiving a low-resolution image input and therefore generates a super-resolution image. In addition, θG is the value of the G parameter, and the generated image is the value of the loss function determined by comparing it to the high-resolution image. Moreover, D is the generated image, which is a high-resolution image or a generated image, and θG denotes the augmentation of D.

To quantitatively measure the image quality, we used the peak signal-to-noise ratio (PSNR), which evaluates the image quality loss information; the structural similarity (SSIM); the visual information fidelity (VIF), which evaluates the accuracy of the visual information; and the relative polar edge coherence (RECO), which evaluates the similarity using polar coordinates, as in Equations (3)(6) [30].

In Equation (3), MAXI is the maximum pixel value of the image. In the case of an 8-bit gray image, it was 255. The higher the MSE is, the better the video quality.

The SSIM index is obtained by multiplying the brightness comparison, contrast comparison, and structure comparison values of the actual and reconstructed images. Because the human visual system is specialized in deriving the structural information of an image, as the key hypothesis, if the structural information is distorted, the perception will be significantly affected. In Equaton (4), μi is the average brightness, and σi is the variance of image i. In addition, Ci is a stabilization constant that supports a situation in which the discriminant has a value of close to zero. Moreover, $(2μxμy+C1)(μx2+μy2+C1)$ is an equation comparing the brightness, and $(2σxσy+C2)(σx2+σy2+C2)$ is the contrast comparison. In addition, $(σxy+C3)(σxσy+C3)$ compares the structures of the two images.

VIF calculates the mutual information between C, which is the original image, and E, which is an image recognized by a human. That is the entropy they share. It also calculates the amount of mutual information between D, which is a distorted image, and F, which is recognized as a distorted image. The quality of the distorted image is then predicted by calculating the ratio between the two, as shown in Equation (5).

The ECO metric is an absolute measure, and thus it might be used in principle for no-Reference quality estimation, provided that a quality unit is defined. However, ECO suffers from the fact that it depends on the image content. To compensate for this, the RECO index, the ratio between the images I(x1,x2) and a reference image Ī(x1,x1) of the same scene, is defined through Equation (6).

$PSNR=20log10(MAXI)-10log10(MSE),$$SSIM(x,y)=(2μxμy+C1)(2σxσy+C2)(σxy+C3)(μx2+μy2+C1)(σx2+σy2+C2)(σxσy+C3),$$VIF=∑i∈S ubbandsI(C→N,j;F→N,j∣SN,j)∑i∈S ubbandsI(C→N,j;E→N,j∣SN,j),$$RECO(σ)=ECO(σ)+CEC˜O(σ)+C.$

### 4.2 Experiment Result

Table 1 shows the performance evaluation for each model, and as a result of the experiment, it can be confirmed that the residual and SE blocks are helpful in improving the performance. In particular, when using the residual and SE blocks, the performance can still be improved. As shown in Table 2, it was possible to see an increase in the operating speed of 1.1- to 2-fold compared to the existing SRGAN; however, the performance was improved, and the performance changed owing to the addition of blocks.

Thereafter, six images were tested with the proposed model, as shown in Table 2. In Table 2, Image #0 is the result of measuring the image quality of the penguin in Figure 6. In the case of Figure 5, the VIF obtained by the proposed method is shown to be the best, and it can be seen that the beak of the penguin in Figure 6 is well restored. In addition, #45 indicates the image in Figure 7, where it can be seen that the proposed the method is similar to the real image and has the smallest grid pattern.

Image #47 is shown in Figure 8, and in the case of the improved residual, a grating pattern is found, although not when applying the proposed method; in addition, the performance evaluation is also high. In the case of #62, as shown in Figure 9, the proposed method achieves the best results for pattern restoration of the hair, and the performance is also high.

In the case of #69, as shown in Figure 10, the performance was better than the proposed method when the residual improved because the pattern of hair was constant. For #82, as shown in Figure 11, the proposed method achieves the best quality evaluation, and the texture restoration is also the highest. Therefore, when using a residual block with an SE block, it was confirmed that the removal of the grid pattern results in a higher restoring force based on the similarity of the pixels.

### 5. Conclusion

Super-resolution image restoration is a major issue in the modern era of the storage and exchanging of image data. In this study, an SRGAN, which has contributed to this area, and three proposed models were tested using the DIV2K dataset to achieve an ultra-high-resolution restoration using a single image. A performance evaluation was conducted for PSNR, SSIM, VIF, and RECO. For the entire dataset, it depends on the model, but on average, 0.01% of VIF and SSIM and 1% of PSNR were better. Six images showed differences in the image quality according to the numerical results. However, RECO achieved worse results than the base model. With this, we know that the SE block improves the performance. The SRGAN+Residual+SE model is said to be the best because it takes a longer time than the other approaches. Therefore, the SRGAN+SE or SRGAN+Residual models can be used. However, there was a problem in that a smoothing phenomenon occurred again when the layer was deepened.

In future research, we will consider using SRGAN+SE or a combined SRGAN+Residual Markov random field to the smoothing problem for improving the restoration performance and restoring the resolution of the depth image.

### Fig 1.

Figure 1.

Type of CNN based super resolution architecture: (a) SR-CNN [9], (b) RSNCNN [13], (c) VDSR [10], and (d) DRCN [14].

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 222-232https://doi.org/10.5391/IJFIS.2021.21.3.222

### Fig 2.

Figure 2.

Comparison of SR and bicubic image.

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 222-232https://doi.org/10.5391/IJFIS.2021.21.3.222

### Fig 3.

Figure 3.

SRGAN overview [7].

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 222-232https://doi.org/10.5391/IJFIS.2021.21.3.222

### Fig 4.

Figure 4.

SE block structure. Fsq: A channel descriptor is created by aggregating the feature maps across the spatial dimension (H×W). This descriptor embeds the global distribution (information) of the channel-wise feature response, and thus information that can be obtained from the global receptive field can be used in all layers of the network. Fex: Simple self-gating is applied to generate the per-channel modulation using weight embedding as the input. C1: Custom module output. C2: SE block inputs and outputs. xin, xout, W, H, and C represent the input, output, width, height, and channel, respectively.

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 222-232https://doi.org/10.5391/IJFIS.2021.21.3.222

### Fig 5.

Figure 5.

Proposed architecture: (a) baseline, (b) SRGAN+SE (c), SRGAN+Residual and (d)SRGAN+Residual+SE.

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 222-232https://doi.org/10.5391/IJFIS.2021.21.3.222

### Fig 6.

Figure 6.

Image #0 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 222-232https://doi.org/10.5391/IJFIS.2021.21.3.222

### Fig 7.

Figure 7.

Image #45 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 222-232https://doi.org/10.5391/IJFIS.2021.21.3.222

### Fig 8.

Figure 8.

Image #47 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 222-232https://doi.org/10.5391/IJFIS.2021.21.3.222

### Fig 9.

Figure 9.

Image #62 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 222-232https://doi.org/10.5391/IJFIS.2021.21.3.222

### Fig 10.

Figure 10.

Image #69 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 222-232https://doi.org/10.5391/IJFIS.2021.21.3.222

### Fig 11.

Figure 11.

Image #82 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 222-232https://doi.org/10.5391/IJFIS.2021.21.3.222

Image quality measurement from model of DIV2K 2017 dataset (unit: dB).

SRGANSRGAN +SESRGAN +ResidualSRGAN +Residual+SE
VIF0.371720.380360.3848590.388321
SSIM0.7702720.781850.7767310.783999
PSNR25.2097626.6898326.0503426.33148
RECO0.8253080.7826260.8041530.775451

Image quality measurement from DIV2K 2017 dataset.

SRGANSRGAN +SESRGAN +ResidualSRGAN +Residual+SE
Time (s)1.85642.14822.52963.9106

Image #0
VIF0.4281650.4541840.4552080.457381
SSIM0.7476100.7798860.7851170.782060
PSNR26.39637127.51896727.35776027.436177
RECO0.8542010.7261980.7788290.707260

Image #45
VIF0.2554550.2657570.2693510.270637
SSIM0.6874840.7044390.6977040.709990
PSNR21.86147222.30850522.37628122.294032
RECO0.7735650.8149060.7861660.780359

Image #47
VIF0.3740790.3898110.3937890.401365
SSIM0.7482300.7630340.7272750.763076
PSNR25.41145725.99912925.85330126.207819
RECO0.8631570.8591680.8749120.836694

Image #62
VIF0.2466990.2375770.2496780.252927
SSIM0.7724320.7679370.7737430.781695
PSNR26.34277029.91251827.48657527.825911
RECO0.8274250.7436270.7881960.836395

Image #69
VIF0.3213730.3257260.3289150.320290
SSIM0.7399720.7451780.7467680.734274
PSNR22.13970822.58686022.49036822.105132
RECO0.6587060.6394480.6461930.639967

Image #82
VIF0.6045460.6091050.6122100.627323
SSIM0.9259060.9306250.9297790.932901
PSNR29.10680331.81300230.73776932.119837
RECO0.9747950.9124060.9506200.852030

### References

1. Anwar, S, Khan, S, and Barnes, N (2020). A deep journey into super-resolution: a survey. ACM Computing Surveys (CSUR). 53, 1-34. https://doi.org/10.1145/3390462
2. Bae, W, Yoo, J, and Ye, JC . Beyond deep residual learning for image restoration: persistent homology-guided manifold simplification., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, Honolulu, HI, pp.1141-1149.
3. Yang, W, Zhang, X, Tian, Y, Wang, W, Xue, JH, and Liao, Q (2019). Deep learning for single image super-resolution: a brief review. IEEE Transactions on Multimedia. 21, 3106-3121. https://doi.org/10.1109/TMM.2019.2919431
4. Gu, J, Sun, X, Zhang, Y, Fu, K, and Wang, L (2019). Deep residual squeeze and excitation network for remote sensing image super-resolution. Remote Sensing. 11. article no. 1817
5. Zhang, Z, Wang, Z, Lin, Z, and Qi, H . Image super-resolution by neural texture transfer., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, Long Beach, CA, pp.7982-7991.
6. Wang, X, Yu, K, Dong, C, and Loy, CC . Recovering realistic texture in image super-resolution by deep spatial feature transform., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, Salt Lake City, UT, pp.606-615.
7. Ledig, C, Theis, L, Huszar, F, Caballero, J, Cunningham, A, and Acosta, A. (2017) . Photo-realistic single image super-resolution using a generative adversarial network. Available: https://arxiv.org/abs/1609.04802v5
8. Yue, L, Shen, H, Li, J, Yuan, Q, Zhang, H, and Zhang, L (2016). Image super-resolution: the techniques, applications, and future. Signal Processing. 128, 389-408. https://doi.org/10.1016/j.sigpro.2016.05.002
9. Dong, C, Loy, CC, He, K, and Tang, X (2014). Learning a deep convolutional network for image super-resolution. Computer Vision - ECCV 2014. Cham, Switzerland: Springer, pp. 184-199 https://doi.org/10.1007/978-3-319-10593-2_13
10. Kim, J, Lee, JK, and Lee, KM . Accurate image super-resolution using very deep convolutional networks., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, Las Vegas, NV, pp.1646-1654.
11. Lim, B, Son, S, Kim, H, Nah, S, and Lee, KM . Enhanced deep residual networks for single image super-resolution., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, Honolulu, HI, pp.1132-1140.
12. Irani, M, and Peleg, S (1991). Improving resolution by image registration. CVGIP: Graphical Models and Image Processing. 53, 231-239. https://doi.org/10.1016/1049-9652(91)90045-L
13. Wei, Y, Wang, Z, and Xu, M (2017). Road structure refined CNN for road extraction in aerial image. IEEE Geoscience and Remote Sensing Letters. 14, 709-713. https://doi.org/10.1109/LGRS.2017.2672734
14. Kim, J, Lee, JK, and Lee, KM . Deeply-recursive convolutional network for image super-resolution., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, Las Vegas, NV, pp.1637-1645.
15. Hajarolasvadi, N, and Demirel, H (2019). 3D CNN-based speech emotion recognition using k-means clustering and spectrograms. Entropy. 21. article no. 479
16. Lee, WY, Ko, KE, Geem, ZW, and Sim, KB (2017). Method that determining the hyperparameter of CNN using HS algorithm. Journal of Korean Institute of Intelligent Systems. 27, 22-28. https://doi.org/10.5391/JKIIS.2017.27.1.022
17. Kim, S, and Cho, Y (2020). An artificial intelligence Othello game agent using CNN based MCTS and reinforcement learning. Journal of Korean Institute of Intelligent Systems. 30, 40-46. https://doi.org/10.5391/JKIIS.2020.30.1.40
18. Goodfellow, I, Pouget-Abadie, J, Mirza, M, Xu, B, Warde-Farley, D, Ozair, S, Courville, AC, and Bengio, Y (2014). Generative adversarial nets. Advances in Neural Information Processing Systems. 27, 2672-2680.
19. He, K, Zhang, X, Ren, S, and Sun, J . Deep residual learning for image recognition., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, Las Vegas, NV, pp.770-778.
20. Szegedy, C, Liu, W, Jia, Y, Sermanet, P, Reed, SE, Anguelov, D, Erhan, D, Vanhoucke, V, and Rabinovich, A . Going deeper with convolutions., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, Boston, MA, pp.1-9.
21. Szegedy, C, Vanhoucke, V, Ioffe, S, Shlens, J, and Wojna, Z. (2015) . Rethinking the inception architecture for computer vision. Available: https://arxiv.org/abs/1512.00567
22. Simonyan, K, and Zisserman, A. (2015) . Very deep convolutional networks for large-scale image recognition. Available: https://arxiv.org/abs/1409.1556
23. Szegedy, C, Ioffe, S, Vanhoucke, V, and Alemi, AA. (2016) . Inception-v4, Inception-ResNet and the impact of residual connections on learning. Available: https://arxiv.org/abs/1602.07261
24. He, K, Zhang, X, Ren, S, and Sun, J (2016). Identity mappings in deep residual networks. Computer Vision - ECCV 2016. Cham, Switzerland: Springer, pp. 630-645 Springer, Cham. https://doi.org/10.1007/978-3-319-46493-0_38
25. Jiao, J, Tu, WC, He, S, and Lau, RW . FormResNet: formatted residual learning for image restoration., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, Honolulu, HI, pp.1034-1042.
26. Hou, J, Si, Y, and Yu, X (). A novel and effective image super-resolution reconstruction technique via fast global and local residual learning model. Applied Sciences. 10, 2020. article no. 1856
27. Hartmann, W, Galliani, S, Havlena, M, Van Gool, L, and Schindler, K . Learned multi-patch similarity., Proceedings of the IEEE International Conference on Computer Vision, 2017, Venice, Italy, pp.1595-1603.
28. Hu, J, Shen, L, and Sun, G . Squeeze-and-excitation networks., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, Salt Lake City, UT, pp.7132-7141.
29. Agustsson, E, and Timofte, R . NTIRE 2017 challenge on single image super-resolution: dataset and study., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, Honolulu, HI, pp.1122-1131.
30. Zhai, G, and Min, X (). Perceptual image quality assessment: a survey. Science China Information Sciences. 63, 2020. article no. 211301