International Journal of Fuzzy Logic and Intelligent Systems 2021; 21(3): 222-232
Published online September 25, 2021
https://doi.org/10.5391/IJFIS.2021.21.3.222
© The Korean Institute of Intelligent Systems
Wang-Su Jeon1 and Sang-Yong Rhee2
1Department of IT Convergence Engineering, Kyungnam University, Changwon, Korea
2Department of Computer Engineering, Kyungnam University, Changwon, Korea
Correspondence to :
Sang-Yong Rhee (syrhee@kyungnam.ac.kr)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Techniques for single-image super-resolution have been developed through deep learning. In this paper, we propose a method using advanced residual learning and squeeze and excitation (SE) blocks for such resolution. Improving the residual learning increases the similarity between pixels by adding one skip to the existing residual, and it is possible to improve the performance while slightly increasing the number of calculations by applying the SE block of SENet. The performance evaluation was tested as part of the super-resolution generative adversarial network (SRGAN) and using three proposed modules, and the effect of the residual and the SE blocks on the super-resolution and the change in performance was confirmed. Although the results vary slightly from image, SE and residual blocks have been found to help improve the performance and are best used with blocks.
Keywords: CNN, Super-resolution, SRGAN, Residual learning, Squeeze and excitation
As living standards improve, demand for high-quality products in all fields, including video images, is increasing. High-quality video means a high resolution and good color. Compression technology is essential for the efficient transmission of high-resolution image data, and a method of restoring information lost during the compression process is required. In addition, medical images, CCTV, and thermal imaging, among other imaging types, do not have a high resolution because of the hardware limitations, and it is difficult to provide an image quality desired by users. An ultra-high-resolution restoration is therefore required to provide a good quality image. Single image super-resolution (SISR) is a technique for restoring a down-sampled image to an original super-resolution image.
Ultra-high super-resolution has made significant progress through recent deep learning technologies [1–4]. However, most studies have focused only on improving the performance of the complete image [5,6], and few studies have considered the image texture [7]. In particular, when an image is simply divided into high- and low-frequency regions and compressed, it is difficult to restore the image because the loss in the high-frequency region is large. In addition, the higher the frequency, the more important the information is, and it can be stated that restoring a high-frequency region well is more important than improving the overall performance.
We use a super-resolution generative adversarial network (SRGAN) [7] as the base model, and because SENet using squeeze and excitation (SE) blocks is superior to ResNet using residual blocks, we change the residual modules of SRGAN into a module network, SB blocks, or a combination of the two to improve the restoration performance of the texture area [4]. The remainder of this paper is organized as follows. Section 2 describes existing studies on high-resolution restoration methods. We propose the three models in Section 3. Section 4 describes the experimental results and an analysis of the proposed methods, and finally, Section 5 presents some concluding remarks and areas of future research.
One way to obtain a high-resolution image is to physically increase the number of pixels in the image sensor. However, this increases the size of the image sensor, which is not economical and makes it difficult to handle. There is another way to reduce the size of a single pixel to fit many pixels in a certain area. However, if the size of the pixel is restored as much as it is reduced, the quality of the final image deteriorates [8].
Therefore, research on the restoration of ultra-high-resolution images using software has been developed, and active research in this area is still ongoing [9]. Ultra-high-resolution image restoration can be broadly divided into multi-frame-based and single-image-based research. In the case of multi-frame-based research [10, 11], it is possible to restore a super-resolution image by complementing the information acquired for each frame; however, a low-resolution image may not be able to be restored, or the quality of the restored result may degrade owing to insufficient information. In the early stage of SISR image restoration studies, although research based on linear, bicubic, and Lanczon filters with a high execution speed were conducted [12], the resulting images were slightly blurred. To overcome these shortcomings, research on preserving the edges has been introduced [9].
With the recent introduction of deep learning technology, ultra-high-resolution images have been successfully restored. The relationship between high and low resolutions has been discovered in the same image. In particular, as shown in Figure 1, a convolutional neural network CNN [9], a very deep super-resolution (VDSR) [10], and a road structure refined CNN (RSRCNN) [13], and a deeply-recursive convolutional network (DRCN) [14], which are based on CNN [15–17], performed better than existing networks. The difference between a low-resolution image obtained using an existing interpolation method and a high-resolution image obtained from these models is shown in Figure 2 [9].
However, CNN-based algorithms using the mean squared error (MSE) based loss functions are still insufficient in representing the textures of ultra-high-resolution images. To address this issue, it is applied to restoring ultra-high-resolution images using SRGAN. An SRGAN is a super-resolution image reconstruction method based on a generative adversarial network [18].
Similar to other GAN models, a SRGAN consists of a generative model and a discriminative model. The SRGAN generates fake images similar to the real image from random noise in the generative model, and simultaneously learns whether the image generated in the discriminative model is real or fake through the process of discrimination. The discriminant model at the beginning of the learning easily determines the authenticity of the image generated by the generative model; however, as the learning progresses, the generative model generates an image that the discriminant model cannot distinguish.
As such, two neural networks with different purposes can be used to construct a system that generates an image similar to the real one. Existing super-simple image resolution techniques have difficulty in restoring texture, which provides detailed information about high-frequency components.
However, an SRGAN provides a solution to this issue. The structure of the SRGAN used is shown in Figure 3. To increase the accuracy, the network consists of a deep model, batch normalization, and skip connection and is trained using perceptual loss. Perceptual loss is the weighted sum of the content and adversarial losses. The VGG loss function is used as the content loss to improve the image texture expression. The core idea of the VGG loss is to find the difference between the image generated by the generator and the original image, not from the resulting images, but after passing through the feature map.
The goal of the proposed system is to learn the texture information. Because most texture regions are high-frequency signals based on edges, the high-frequency information is approximated through residual learning. In this section, a brief description of a residual network and SENet is provided. In addition, the structure of the proposed method for increasing the effective performance through residual learning is described.
ResNet [19] was developed to solve the problem in which the deeper the neural network depth is, the more difficult the learning becomes. Therefore, by learning the residuals, deep networks can be learned more easily than with GoogleNet [20, 21] and VGGNet [22]. In addition, it is easy to optimize the network, and a high accuracy is obtained through a deep model [23–25].
After learning and evaluating ImageNet using ResNet, the result won the ILSVRC 2015 challenge with a very small error of 3.57%, and a good performance was demonstrated in the detection and segmentation of the CIFAR-10 and COCO datasets. Residual learning can be considered an ensemble approach in which the identity mapping between layers and weights learned from shallow models is combined. This is also similar to adjusting the reflection ratio of the old and new memories in long short-term memory. Residual learning can be seen as approximating information between pixels, which improves the restoration performance [26, 27].
In this study, the SE block, which is applied in SENet [28], was used to improve the performance. The SE block can be applied to any existing model, as shown in Figure 4. If the SE block shown in Figure 3 is added to VGGNet, GoogLeNet, and ResNet, among others, the performance will increase. Although the performance is greatly improved, the number of parameters does not increase significantly, and thus the increase in the number of computations is not large.
In general, to improve the performance, the number of computations increases by that amount. In the case of SENet, use of the SE block is efficient because the accuracy can be improved without significantly increasing the number of computations.
The structure of the system module suggested in this study is shown in Figure 5. We applied the residual and SE blocks separately to the SRGAN and verified how the network affects the super-resolution by applying it concurrently. Figure 5(a) shows the conventional ResNet, and Figure 5(b) shows ResNet with the addition of an SE block. As mentioned earlier, this aims to maximize the texture information and minimize the smoothness. Figure 5(c) shows the addition of the term of the residual block to Figure 5(a) for use in learning when considering the characteristics at different high frequencies. Figure 5(d) shows the addition of the SE block to the ResNet shown in Figure 5(c).
The experimental environment used in this study was as follows. The hardware consists of an Intel Xeon Gold 5120 CPU, 64 GB of RAM, and GPGPU is the RTX TITAN X of 24 GB. The deep learning framework is TensorFlow 1.14.0 in Ubuntu 18.04.4, and the dataset used for training is DIV2K 2017 [29]. The DIV2K data consist of high-resolution images with a pixel resolution of 2044×1404 and low-resolution images corresponding to each image. And the training set consists of 800 images, and the test set consists of 100 images.
To form the model, we apply 200 epochs and use the Adam optimizer with a learning rate of 0.0001 and a momentum of 0.9. This model has a batch size of 16. To be used for learning, the generator restores the reduced images (96 × 96 in size) to a pixel resolution of 384 × 384, and the discriminator determines the similarity by inputting the output of the generator and the high-resolution image, executing a binary classification to classify it into low- and high-resolution images, with learning progressing as a single network connecting the generator and discriminator. In the discriminator, the original high-resolution image and the resulting super-resolution image are compared and evaluated, and learning continues. During the learning, the sigmoid function included at the end of the discriminator is removed, and the loss function is calculated. The final loss value is calculated by adding the contradictory loss value calculated in the discriminator to the content loss value generated by the independent VGG at a given ratio. The builder’s formula and the SRGAN identification formula are shown in
In the above equation, G is a generator receiving a low-resolution image input and therefore generates a super-resolution image. In addition,
To quantitatively measure the image quality, we used the peak signal-to-noise ratio (PSNR), which evaluates the image quality loss information; the structural similarity (SSIM); the visual information fidelity (VIF), which evaluates the accuracy of the visual information; and the relative polar edge coherence (RECO), which evaluates the similarity using polar coordinates, as in
In
The SSIM index is obtained by multiplying the brightness comparison, contrast comparison, and structure comparison values of the actual and reconstructed images. Because the human visual system is specialized in deriving the structural information of an image, as the key hypothesis, if the structural information is distorted, the perception will be significantly affected. In
VIF calculates the mutual information between C, which is the original image, and E, which is an image recognized by a human. That is the entropy they share. It also calculates the amount of mutual information between D, which is a distorted image, and F, which is recognized as a distorted image. The quality of the distorted image is then predicted by calculating the ratio between the two, as shown in
The ECO metric is an absolute measure, and thus it might be used in principle for no-Reference quality estimation, provided that a quality unit is defined. However, ECO suffers from the fact that it depends on the image content. To compensate for this, the RECO index, the ratio between the images
Table 1 shows the performance evaluation for each model, and as a result of the experiment, it can be confirmed that the residual and SE blocks are helpful in improving the performance. In particular, when using the residual and SE blocks, the performance can still be improved. As shown in Table 2, it was possible to see an increase in the operating speed of 1.1- to 2-fold compared to the existing SRGAN; however, the performance was improved, and the performance changed owing to the addition of blocks.
Thereafter, six images were tested with the proposed model, as shown in Table 2. In Table 2, Image #0 is the result of measuring the image quality of the penguin in Figure 6. In the case of Figure 5, the VIF obtained by the proposed method is shown to be the best, and it can be seen that the beak of the penguin in Figure 6 is well restored. In addition, #45 indicates the image in Figure 7, where it can be seen that the proposed the method is similar to the real image and has the smallest grid pattern.
Image #47 is shown in Figure 8, and in the case of the improved residual, a grating pattern is found, although not when applying the proposed method; in addition, the performance evaluation is also high. In the case of #62, as shown in Figure 9, the proposed method achieves the best results for pattern restoration of the hair, and the performance is also high.
In the case of #69, as shown in Figure 10, the performance was better than the proposed method when the residual improved because the pattern of hair was constant. For #82, as shown in Figure 11, the proposed method achieves the best quality evaluation, and the texture restoration is also the highest. Therefore, when using a residual block with an SE block, it was confirmed that the removal of the grid pattern results in a higher restoring force based on the similarity of the pixels.
Super-resolution image restoration is a major issue in the modern era of the storage and exchanging of image data. In this study, an SRGAN, which has contributed to this area, and three proposed models were tested using the DIV2K dataset to achieve an ultra-high-resolution restoration using a single image. A performance evaluation was conducted for PSNR, SSIM, VIF, and RECO. For the entire dataset, it depends on the model, but on average, 0.01% of VIF and SSIM and 1% of PSNR were better. Six images showed differences in the image quality according to the numerical results. However, RECO achieved worse results than the base model. With this, we know that the SE block improves the performance. The SRGAN+Residual+SE model is said to be the best because it takes a longer time than the other approaches. Therefore, the SRGAN+SE or SRGAN+Residual models can be used. However, there was a problem in that a smoothing phenomenon occurred again when the layer was deepened.
In future research, we will consider using SRGAN+SE or a combined SRGAN+Residual Markov random field to the smoothing problem for improving the restoration performance and restoring the resolution of the depth image.
No potential conflict of interest relevant to this article was reported.
Type of CNN based super resolution architecture: (a) SR-CNN [
SE block structure.
Proposed architecture: (a) baseline, (b) SRGAN+SE (c), SRGAN+Residual and (d)SRGAN+Residual+SE.
Image #0 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.
Image #45 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.
Image #47 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.
Image #62 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.
Image #69 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE
Image #82 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.
Table 1. Image quality measurement from model of DIV2K 2017 dataset (unit: dB).
SRGAN | SRGAN +SE | SRGAN +Residual | SRGAN +Residual+SE | |
---|---|---|---|---|
0.37172 | 0.38036 | 0.384859 | 0.388321 | |
0.770272 | 0.78185 | 0.776731 | 0.783999 | |
25.20976 | 26.68983 | 26.05034 | 26.33148 | |
0.825308 | 0.782626 | 0.804153 | 0.775451 |
Table 2. Image quality measurement from DIV2K 2017 dataset.
SRGAN | SRGAN +SE | SRGAN +Residual | SRGAN +Residual+SE | |
---|---|---|---|---|
1.8564 | 2.1482 | 2.5296 | 3.9106 | |
VIF | 0.428165 | 0.454184 | 0.455208 | 0.457381 |
SSIM | 0.747610 | 0.779886 | 0.785117 | 0.782060 |
PSNR | 26.396371 | 27.518967 | 27.357760 | 27.436177 |
RECO | 0.854201 | 0.726198 | 0.778829 | 0.707260 |
VIF | 0.255455 | 0.265757 | 0.269351 | 0.270637 |
SSIM | 0.687484 | 0.704439 | 0.697704 | 0.709990 |
PSNR | 21.861472 | 22.308505 | 22.376281 | 22.294032 |
RECO | 0.773565 | 0.814906 | 0.786166 | 0.780359 |
VIF | 0.374079 | 0.389811 | 0.393789 | 0.401365 |
SSIM | 0.748230 | 0.763034 | 0.727275 | 0.763076 |
PSNR | 25.411457 | 25.999129 | 25.853301 | 26.207819 |
RECO | 0.863157 | 0.859168 | 0.874912 | 0.836694 |
VIF | 0.246699 | 0.237577 | 0.249678 | 0.252927 |
SSIM | 0.772432 | 0.767937 | 0.773743 | 0.781695 |
PSNR | 26.342770 | 29.912518 | 27.486575 | 27.825911 |
RECO | 0.827425 | 0.743627 | 0.788196 | 0.836395 |
VIF | 0.321373 | 0.325726 | 0.328915 | 0.320290 |
SSIM | 0.739972 | 0.745178 | 0.746768 | 0.734274 |
PSNR | 22.139708 | 22.586860 | 22.490368 | 22.105132 |
RECO | 0.658706 | 0.639448 | 0.646193 | 0.639967 |
VIF | 0.604546 | 0.609105 | 0.612210 | 0.627323 |
SSIM | 0.925906 | 0.930625 | 0.929779 | 0.932901 |
PSNR | 29.106803 | 31.813002 | 30.737769 | 32.119837 |
RECO | 0.974795 | 0.912406 | 0.950620 | 0.852030 |
E-mail: jws2218@naver.com
E-mail: syrhee@kyungnam.ac.kr
International Journal of Fuzzy Logic and Intelligent Systems 2021; 21(3): 222-232
Published online September 25, 2021 https://doi.org/10.5391/IJFIS.2021.21.3.222
Copyright © The Korean Institute of Intelligent Systems.
Wang-Su Jeon1 and Sang-Yong Rhee2
1Department of IT Convergence Engineering, Kyungnam University, Changwon, Korea
2Department of Computer Engineering, Kyungnam University, Changwon, Korea
Correspondence to:Sang-Yong Rhee (syrhee@kyungnam.ac.kr)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Techniques for single-image super-resolution have been developed through deep learning. In this paper, we propose a method using advanced residual learning and squeeze and excitation (SE) blocks for such resolution. Improving the residual learning increases the similarity between pixels by adding one skip to the existing residual, and it is possible to improve the performance while slightly increasing the number of calculations by applying the SE block of SENet. The performance evaluation was tested as part of the super-resolution generative adversarial network (SRGAN) and using three proposed modules, and the effect of the residual and the SE blocks on the super-resolution and the change in performance was confirmed. Although the results vary slightly from image, SE and residual blocks have been found to help improve the performance and are best used with blocks.
Keywords: CNN, Super-resolution, SRGAN, Residual learning, Squeeze and excitation
As living standards improve, demand for high-quality products in all fields, including video images, is increasing. High-quality video means a high resolution and good color. Compression technology is essential for the efficient transmission of high-resolution image data, and a method of restoring information lost during the compression process is required. In addition, medical images, CCTV, and thermal imaging, among other imaging types, do not have a high resolution because of the hardware limitations, and it is difficult to provide an image quality desired by users. An ultra-high-resolution restoration is therefore required to provide a good quality image. Single image super-resolution (SISR) is a technique for restoring a down-sampled image to an original super-resolution image.
Ultra-high super-resolution has made significant progress through recent deep learning technologies [1–4]. However, most studies have focused only on improving the performance of the complete image [5,6], and few studies have considered the image texture [7]. In particular, when an image is simply divided into high- and low-frequency regions and compressed, it is difficult to restore the image because the loss in the high-frequency region is large. In addition, the higher the frequency, the more important the information is, and it can be stated that restoring a high-frequency region well is more important than improving the overall performance.
We use a super-resolution generative adversarial network (SRGAN) [7] as the base model, and because SENet using squeeze and excitation (SE) blocks is superior to ResNet using residual blocks, we change the residual modules of SRGAN into a module network, SB blocks, or a combination of the two to improve the restoration performance of the texture area [4]. The remainder of this paper is organized as follows. Section 2 describes existing studies on high-resolution restoration methods. We propose the three models in Section 3. Section 4 describes the experimental results and an analysis of the proposed methods, and finally, Section 5 presents some concluding remarks and areas of future research.
One way to obtain a high-resolution image is to physically increase the number of pixels in the image sensor. However, this increases the size of the image sensor, which is not economical and makes it difficult to handle. There is another way to reduce the size of a single pixel to fit many pixels in a certain area. However, if the size of the pixel is restored as much as it is reduced, the quality of the final image deteriorates [8].
Therefore, research on the restoration of ultra-high-resolution images using software has been developed, and active research in this area is still ongoing [9]. Ultra-high-resolution image restoration can be broadly divided into multi-frame-based and single-image-based research. In the case of multi-frame-based research [10, 11], it is possible to restore a super-resolution image by complementing the information acquired for each frame; however, a low-resolution image may not be able to be restored, or the quality of the restored result may degrade owing to insufficient information. In the early stage of SISR image restoration studies, although research based on linear, bicubic, and Lanczon filters with a high execution speed were conducted [12], the resulting images were slightly blurred. To overcome these shortcomings, research on preserving the edges has been introduced [9].
With the recent introduction of deep learning technology, ultra-high-resolution images have been successfully restored. The relationship between high and low resolutions has been discovered in the same image. In particular, as shown in Figure 1, a convolutional neural network CNN [9], a very deep super-resolution (VDSR) [10], and a road structure refined CNN (RSRCNN) [13], and a deeply-recursive convolutional network (DRCN) [14], which are based on CNN [15–17], performed better than existing networks. The difference between a low-resolution image obtained using an existing interpolation method and a high-resolution image obtained from these models is shown in Figure 2 [9].
However, CNN-based algorithms using the mean squared error (MSE) based loss functions are still insufficient in representing the textures of ultra-high-resolution images. To address this issue, it is applied to restoring ultra-high-resolution images using SRGAN. An SRGAN is a super-resolution image reconstruction method based on a generative adversarial network [18].
Similar to other GAN models, a SRGAN consists of a generative model and a discriminative model. The SRGAN generates fake images similar to the real image from random noise in the generative model, and simultaneously learns whether the image generated in the discriminative model is real or fake through the process of discrimination. The discriminant model at the beginning of the learning easily determines the authenticity of the image generated by the generative model; however, as the learning progresses, the generative model generates an image that the discriminant model cannot distinguish.
As such, two neural networks with different purposes can be used to construct a system that generates an image similar to the real one. Existing super-simple image resolution techniques have difficulty in restoring texture, which provides detailed information about high-frequency components.
However, an SRGAN provides a solution to this issue. The structure of the SRGAN used is shown in Figure 3. To increase the accuracy, the network consists of a deep model, batch normalization, and skip connection and is trained using perceptual loss. Perceptual loss is the weighted sum of the content and adversarial losses. The VGG loss function is used as the content loss to improve the image texture expression. The core idea of the VGG loss is to find the difference between the image generated by the generator and the original image, not from the resulting images, but after passing through the feature map.
The goal of the proposed system is to learn the texture information. Because most texture regions are high-frequency signals based on edges, the high-frequency information is approximated through residual learning. In this section, a brief description of a residual network and SENet is provided. In addition, the structure of the proposed method for increasing the effective performance through residual learning is described.
ResNet [19] was developed to solve the problem in which the deeper the neural network depth is, the more difficult the learning becomes. Therefore, by learning the residuals, deep networks can be learned more easily than with GoogleNet [20, 21] and VGGNet [22]. In addition, it is easy to optimize the network, and a high accuracy is obtained through a deep model [23–25].
After learning and evaluating ImageNet using ResNet, the result won the ILSVRC 2015 challenge with a very small error of 3.57%, and a good performance was demonstrated in the detection and segmentation of the CIFAR-10 and COCO datasets. Residual learning can be considered an ensemble approach in which the identity mapping between layers and weights learned from shallow models is combined. This is also similar to adjusting the reflection ratio of the old and new memories in long short-term memory. Residual learning can be seen as approximating information between pixels, which improves the restoration performance [26, 27].
In this study, the SE block, which is applied in SENet [28], was used to improve the performance. The SE block can be applied to any existing model, as shown in Figure 4. If the SE block shown in Figure 3 is added to VGGNet, GoogLeNet, and ResNet, among others, the performance will increase. Although the performance is greatly improved, the number of parameters does not increase significantly, and thus the increase in the number of computations is not large.
In general, to improve the performance, the number of computations increases by that amount. In the case of SENet, use of the SE block is efficient because the accuracy can be improved without significantly increasing the number of computations.
The structure of the system module suggested in this study is shown in Figure 5. We applied the residual and SE blocks separately to the SRGAN and verified how the network affects the super-resolution by applying it concurrently. Figure 5(a) shows the conventional ResNet, and Figure 5(b) shows ResNet with the addition of an SE block. As mentioned earlier, this aims to maximize the texture information and minimize the smoothness. Figure 5(c) shows the addition of the term of the residual block to Figure 5(a) for use in learning when considering the characteristics at different high frequencies. Figure 5(d) shows the addition of the SE block to the ResNet shown in Figure 5(c).
The experimental environment used in this study was as follows. The hardware consists of an Intel Xeon Gold 5120 CPU, 64 GB of RAM, and GPGPU is the RTX TITAN X of 24 GB. The deep learning framework is TensorFlow 1.14.0 in Ubuntu 18.04.4, and the dataset used for training is DIV2K 2017 [29]. The DIV2K data consist of high-resolution images with a pixel resolution of 2044×1404 and low-resolution images corresponding to each image. And the training set consists of 800 images, and the test set consists of 100 images.
To form the model, we apply 200 epochs and use the Adam optimizer with a learning rate of 0.0001 and a momentum of 0.9. This model has a batch size of 16. To be used for learning, the generator restores the reduced images (96 × 96 in size) to a pixel resolution of 384 × 384, and the discriminator determines the similarity by inputting the output of the generator and the high-resolution image, executing a binary classification to classify it into low- and high-resolution images, with learning progressing as a single network connecting the generator and discriminator. In the discriminator, the original high-resolution image and the resulting super-resolution image are compared and evaluated, and learning continues. During the learning, the sigmoid function included at the end of the discriminator is removed, and the loss function is calculated. The final loss value is calculated by adding the contradictory loss value calculated in the discriminator to the content loss value generated by the independent VGG at a given ratio. The builder’s formula and the SRGAN identification formula are shown in
In the above equation, G is a generator receiving a low-resolution image input and therefore generates a super-resolution image. In addition,
To quantitatively measure the image quality, we used the peak signal-to-noise ratio (PSNR), which evaluates the image quality loss information; the structural similarity (SSIM); the visual information fidelity (VIF), which evaluates the accuracy of the visual information; and the relative polar edge coherence (RECO), which evaluates the similarity using polar coordinates, as in
In
The SSIM index is obtained by multiplying the brightness comparison, contrast comparison, and structure comparison values of the actual and reconstructed images. Because the human visual system is specialized in deriving the structural information of an image, as the key hypothesis, if the structural information is distorted, the perception will be significantly affected. In
VIF calculates the mutual information between C, which is the original image, and E, which is an image recognized by a human. That is the entropy they share. It also calculates the amount of mutual information between D, which is a distorted image, and F, which is recognized as a distorted image. The quality of the distorted image is then predicted by calculating the ratio between the two, as shown in
The ECO metric is an absolute measure, and thus it might be used in principle for no-Reference quality estimation, provided that a quality unit is defined. However, ECO suffers from the fact that it depends on the image content. To compensate for this, the RECO index, the ratio between the images
Table 1 shows the performance evaluation for each model, and as a result of the experiment, it can be confirmed that the residual and SE blocks are helpful in improving the performance. In particular, when using the residual and SE blocks, the performance can still be improved. As shown in Table 2, it was possible to see an increase in the operating speed of 1.1- to 2-fold compared to the existing SRGAN; however, the performance was improved, and the performance changed owing to the addition of blocks.
Thereafter, six images were tested with the proposed model, as shown in Table 2. In Table 2, Image #0 is the result of measuring the image quality of the penguin in Figure 6. In the case of Figure 5, the VIF obtained by the proposed method is shown to be the best, and it can be seen that the beak of the penguin in Figure 6 is well restored. In addition, #45 indicates the image in Figure 7, where it can be seen that the proposed the method is similar to the real image and has the smallest grid pattern.
Image #47 is shown in Figure 8, and in the case of the improved residual, a grating pattern is found, although not when applying the proposed method; in addition, the performance evaluation is also high. In the case of #62, as shown in Figure 9, the proposed method achieves the best results for pattern restoration of the hair, and the performance is also high.
In the case of #69, as shown in Figure 10, the performance was better than the proposed method when the residual improved because the pattern of hair was constant. For #82, as shown in Figure 11, the proposed method achieves the best quality evaluation, and the texture restoration is also the highest. Therefore, when using a residual block with an SE block, it was confirmed that the removal of the grid pattern results in a higher restoring force based on the similarity of the pixels.
Super-resolution image restoration is a major issue in the modern era of the storage and exchanging of image data. In this study, an SRGAN, which has contributed to this area, and three proposed models were tested using the DIV2K dataset to achieve an ultra-high-resolution restoration using a single image. A performance evaluation was conducted for PSNR, SSIM, VIF, and RECO. For the entire dataset, it depends on the model, but on average, 0.01% of VIF and SSIM and 1% of PSNR were better. Six images showed differences in the image quality according to the numerical results. However, RECO achieved worse results than the base model. With this, we know that the SE block improves the performance. The SRGAN+Residual+SE model is said to be the best because it takes a longer time than the other approaches. Therefore, the SRGAN+SE or SRGAN+Residual models can be used. However, there was a problem in that a smoothing phenomenon occurred again when the layer was deepened.
In future research, we will consider using SRGAN+SE or a combined SRGAN+Residual Markov random field to the smoothing problem for improving the restoration performance and restoring the resolution of the depth image.
Type of CNN based super resolution architecture: (a) SR-CNN [
Comparison of SR and bicubic image.
SRGAN overview [
SE block structure.
Proposed architecture: (a) baseline, (b) SRGAN+SE (c), SRGAN+Residual and (d)SRGAN+Residual+SE.
Image #0 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.
Image #45 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.
Image #47 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.
Image #62 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.
Image #69 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE
Image #82 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.
Table 1 . Image quality measurement from model of DIV2K 2017 dataset (unit: dB).
SRGAN | SRGAN +SE | SRGAN +Residual | SRGAN +Residual+SE | |
---|---|---|---|---|
0.37172 | 0.38036 | 0.384859 | 0.388321 | |
0.770272 | 0.78185 | 0.776731 | 0.783999 | |
25.20976 | 26.68983 | 26.05034 | 26.33148 | |
0.825308 | 0.782626 | 0.804153 | 0.775451 |
Table 2 . Image quality measurement from DIV2K 2017 dataset.
SRGAN | SRGAN +SE | SRGAN +Residual | SRGAN +Residual+SE | |
---|---|---|---|---|
1.8564 | 2.1482 | 2.5296 | 3.9106 | |
VIF | 0.428165 | 0.454184 | 0.455208 | 0.457381 |
SSIM | 0.747610 | 0.779886 | 0.785117 | 0.782060 |
PSNR | 26.396371 | 27.518967 | 27.357760 | 27.436177 |
RECO | 0.854201 | 0.726198 | 0.778829 | 0.707260 |
VIF | 0.255455 | 0.265757 | 0.269351 | 0.270637 |
SSIM | 0.687484 | 0.704439 | 0.697704 | 0.709990 |
PSNR | 21.861472 | 22.308505 | 22.376281 | 22.294032 |
RECO | 0.773565 | 0.814906 | 0.786166 | 0.780359 |
VIF | 0.374079 | 0.389811 | 0.393789 | 0.401365 |
SSIM | 0.748230 | 0.763034 | 0.727275 | 0.763076 |
PSNR | 25.411457 | 25.999129 | 25.853301 | 26.207819 |
RECO | 0.863157 | 0.859168 | 0.874912 | 0.836694 |
VIF | 0.246699 | 0.237577 | 0.249678 | 0.252927 |
SSIM | 0.772432 | 0.767937 | 0.773743 | 0.781695 |
PSNR | 26.342770 | 29.912518 | 27.486575 | 27.825911 |
RECO | 0.827425 | 0.743627 | 0.788196 | 0.836395 |
VIF | 0.321373 | 0.325726 | 0.328915 | 0.320290 |
SSIM | 0.739972 | 0.745178 | 0.746768 | 0.734274 |
PSNR | 22.139708 | 22.586860 | 22.490368 | 22.105132 |
RECO | 0.658706 | 0.639448 | 0.646193 | 0.639967 |
VIF | 0.604546 | 0.609105 | 0.612210 | 0.627323 |
SSIM | 0.925906 | 0.930625 | 0.929779 | 0.932901 |
PSNR | 29.106803 | 31.813002 | 30.737769 | 32.119837 |
RECO | 0.974795 | 0.912406 | 0.950620 | 0.852030 |
Akmaljon Palvanov, and Young Im Cho
Int. J. Fuzzy Log. Intell. Syst. 2018; 18(2): 126-134 https://doi.org/10.5391/IJFIS.2018.18.2.126Igor V. Arinichev, Sergey V. Polyanskikh, Irina V. Arinicheva, Galina V. Volkova, and Irina P. Matveeva
International Journal of Fuzzy Logic and Intelligent Systems 2022; 22(1): 106-115 https://doi.org/10.5391/IJFIS.2022.22.1.106Tosin Akinwale Adesuyi, Byeong Man Kim, and Jongwan Kim
International Journal of Fuzzy Logic and Intelligent Systems 2022; 22(1): 1-10 https://doi.org/10.5391/IJFIS.2022.22.1.1Type of CNN based super resolution architecture: (a) SR-CNN [
Comparison of SR and bicubic image.
|@|~(^,^)~|@|SRGAN overview [
SE block structure.
Proposed architecture: (a) baseline, (b) SRGAN+SE (c), SRGAN+Residual and (d)SRGAN+Residual+SE.
|@|~(^,^)~|@|Image #0 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.
|@|~(^,^)~|@|Image #45 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.
|@|~(^,^)~|@|Image #47 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.
|@|~(^,^)~|@|Image #62 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.
|@|~(^,^)~|@|Image #69 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE
|@|~(^,^)~|@|Image #82 quality measurement in DIV2K dataset: (a) high resolution, (b) SRGAN, (c) SRGAN+SE, (d) SRGAN+Residual, and (e) SRGAN+Residual+SE.