International Journal of Fuzzy Logic and Intelligent Systems 2020; 20(1): 8-16
Published online March 25, 2020
https://doi.org/10.5391/IJFIS.2020.20.1.8
© The Korean Institute of Intelligent Systems
Muhammad Aamir1, Nazri Mohd Nawi2, Hairulnizam Bin Mahdin1, Rashid Naseem3, and Muhammad Zulqarnain1
1Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu Pahat, Malaysia
2Soft Computing and Data Mining Center, Universiti Tun Hussein Onn Malaysia, Batu Pahat, Malaysia
3Department of IT and Computer Science, Pak-Austria Fachhochschule Institute of Applied Sciences and Technology, Haripur, Pakistan
Correspondence to :
Muhammad Aamir (amirr.khan1@gmail.com)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Auto-encoders (AEs) have been proposed for solving many problems in the domain of machine learning and deep learning since the last few decades. Due to their satisfactory performance, their multiple variations have also recently appeared. First, we introduce the conventional AE model and its different variant for learning abstract features from data by using a contrastive divergence algorithm. Second, we present the major differences among the following three popular AE variants: sparse AE (SAE), denoising AE (DAE), and contractive AE (CAE). Third, the main contribution of this study is performing the comparative study of the aforementioned three AE variants on the basis of their mathematical modeling and experiments. All the variants of the standard AE are evaluated on the basis of the MNIST benchmark handwritten digit dataset for classification problem. The observed output reveals the benefit of using the AE model and its variants. From the experiments, it is concluded that CAE achieved better classification accuracy than those of SAE and DAE.
Keywords: Sparse auto-encoder (SAE), Denoising auto-encoder (DAE), Contractive auto-encoder (CAE), MNIST, Classification
Auto-encoders (AEs) are unsupervised neural networks that apply the back-propagation behavior by setting up the high-dimensional input feature set to a low-dimensional output feature set and then recovering the original feature set from the output. The reduction procedure of high-dimensional data to low-dimensional data is known as encoding, while the reconstruction of the original data from the low-dimensional data is called decoding. AE was proposed to improve the reconstruction reliability of low-dimensional feature sets. Notably, feature-set selection is the most important step in solving big-data problems and processing complex data with high-dimensional attributes [1]. In addition, feature-selection is a valuable method for performing classification and prediction [2, 3]. It is a satisfactory procedure to split the useful and effective features from ineffective and useless ones within the feature representation space. However, if irrelevant raw features are provided as input, feature selection might fail [4, 5]. Using AEs is one of the most efficient approaches for feature reduction, as it results in satisfactory classification. It traditionally works in two phases, namely, encoding and decoding. The process of converting the input features to a new representation is called encoding, whereas that of converting this new representation back to the original inputs is called decoding. The primary aim of AEs is to identify more useful and informative features from a large amount of data without applying any feature-reduction technique [6–8].
To improve the efficiency of the standard AE, [9] proposed an AE-based learning approach with two-stage architecture. They stacked the AE by considering both intra- and inter-modal semantics and named it sparse AE (SAE). Similarly, [10] proposed a feature-learning approach by stacking the contractive AE (CAE) and naming it sCAE. Their research work focused on extracting the temporal feature difference of superpixel and noise suppression. To benefit from both the characteristics of AE and properties of conventional algorithms, [11, 12] merged the CAE with the convolutional neural network to form a deep architecture named DSCNN, to solve the problem of both the presence of speckle noise and absence of effective features in single-polarized SAR (synthetic aperture radar) data.
Notable, some popular AE variants are SAEs [13], denoising AEs (DAEs) [14], and CAEs [15]. In this study, we first introduce AE and then present its three popular variants, namely, SAEs, DAEs, and CAEs. We conducted some experiments to empirically compare their performances and properties with one another. A comprehensive analysis based on the experimental results is also given. Conclusion is presented in the last section.
AEs are also called auto-associators. In the domain of machine learning, AEs comprise a complicated neural network with three layers and have been investigated by a many researchers since the 1990s [16]. The entire processing inside an AE proceeds in two phases, namely encoding and decoding, as depicted in Figure 1.
The process of mapping the input feature set to transform it to give its intermediate representation to the hidden layer is called encoding, which is given as follows:
where
The process of mapping the output of the hidden layer back into the input feature set is called decoding, which is given as follows:
where
The terms
The primary aim of the reconstruction is to generate the outputs that are similar to the original inputs to the maximum possible extent, by reducing the reconstruction error. The reconstruction layer uses the following parameter set to reconstruct the original inputs:
Let us assume we have the input feature set as
While
where
where the relative importance of regularization is controlled via weight coefficient decay
SAEs are some of the simplest AE variants that contain a sparsity term, which leads a network that comprises a single layer to learn a code dictionary that reduces both the reconstruction error and number of code-words necessary for reconstruction. The main purpose of adding a sparse penalty term to the SAE cost function is to limit the activation value at the average of the hidden-layer neuron [17]. Extracting sparse features from raw data is the main aim of SAEs. To achieve the sparsity of objects presentation, two methods can be used. One method is to penalize the hidden unit bias, and the other one is to directly penalize the hidden unit’s activation output, which has been discussed by [18]. Notably, [19] used sparse multi-layered AE framework for performing the auto-detection of nuclei on a set of 537 marked histopathological breast-cancer images. The entire architecture comprised one input layer, two hidden layers, and one output layer. They stacked the SAE and used Softmax for performing classification. In addition, 3,468 nodes served as input to the input layer, 400 nodes in the first hidden layer, and 255 nodes in the second hidden layer. The output from the second hidden layer was an input to the final Softmax layer, which was mapped to two classes, either 1 or 0. Generally, if the output value of a neuron is 1, it is indicated that the neuron is inactive. However, if the output value of a neuron is 1, it is indicated that the neuron is active. The aim of implementing sparsity is to bind the unwanted activation and the updated forms of
where
DAE, which are the modest alteration of standard AEs, are not trained for reconstructing their input but to denoise an artificially corrupted version of their input [20]. Whereas an over-complete regular AE can easily learn a useless identity mapping, a DAE must extract more useful features to solve the considerably more difficult denoising problem. The main role of DAEs is to reconstruct noisy data from the original data. DAEs are used for the following two purposes: encoding the noisy data and recovering the original input data from the reconstructed output data. When data encoders are stacked in different layers, they form stacked DAEs [21, 22]. Notably, [23] adopted a weighted reconstruction loss function to the conventional DAE for performing noise classification in a speech-enhancement system. They stacked several weighted DAEs to construct the model. In their experiments, they performed 50 steps with the number of input nodes ranging from 50 to 100. The un-noisy data comprised 8 languages, and a white noise with the signal-to-noise ratios (SNRs) of 6, 12, and 18 dB was selected from the NTT database to train the model. The model was trained on the data of length 1 hour, and it was tested on the data of length 8 minutes. In the literature, DAEs have been the center of interest for many researchers in different contexts, as per [12, 13]. Based on AEs and
A training input
CAEs, which are the popular and effective variants of AEs, are based on the unsupervised-learning approach for the production of valuable feature representations [24]. The trained models that learned feature representations using CAE are highly robust to minor noises and small changes within the training data. CAEs are considered the extended forms of the DAEs in which contractive penalty is added to the error function of reconstruction [25]. This penalty is, in turn, used to penalize the attribute sensitivity to the variations in the inputs. Based on learning a robust feature set, [26] proposed a CAE with an unconventional regularization yielding objective function. Based on
where
We performed several experiments to evaluate and then compare the feature-learning abilities of the three AEs variants, namely, SAE, DAE, and CAE. To conduct fair experiments and comparison, all the AE variants were assumed to have the same architecture with one input layer, two hidden layers, and a Softmax regression classifier in the last layer for final classification. In this study, all the experiments were conducted using an Intel core i3 CPU with 4 GB of RAM and Windows 8.1 operating system. The compiler and language used for developing and testing these algorithms was Python2.7. For fast implementation of the comparative approaches, an efficient numeric computational open-source library TensorFlow [27] was used. In our experiments, 12,000 images were considered for evaluation and comparison with some comparative models. We used 5,000 random images for validating each MNIST variation dataset. All the images comprised 0–9 digits with the size of 28 x 28 pixels. A few example images, which show the shape deformation and variation in digits appearance, are depicted in Figure 2. This variation makes it a challenging dataset to be used to test the models. The variation datasets of MNIST are standard (basic), random-noise background digits (bg-rand), and rotation and image background digits (bg-img-rot) [28]. The first MNIST basic (mnist-basic) comprises the standard MNIST image without any changes. In the bg-rand variant dataset, a background is randomly added to the digit image. The value of the background for every pixel is consistently produced in the fixed range of 0 to 255. In the last dataset, the variation factor of rotating digits and addition of random background is merged in a single dataset. In addition, six random images from each MNIST variant dataset are depicted in Figures 2, 3, and 4, respectively. All the comparative approaches are used in combination with features extractors, and the Softmax classifier is fitted in the last layer for performing final classification. In the experiments using each approach, we randomly selected sample images for half of the training and half of the testing for each run. Every instance of model running is independently repeated 10 times, and the mean value of the errors is reported in the results.
In experiments, the different AE variants leveraged a Softmax classifier to estimate the overall classification behavior of models based on MNIST benchmark variant datasets. Figures 5, 6, and 7 depict the results for SAE on three MNIST benchmark variant datasets. Similarly, Figures 8, 9, and 10 depict the results for DAE. Furthermore, Figures 11, 12, and 13 depict the output roc curve for CAE. Upon analyzing all the experiments, as depicted in these figures, a gradual decrease is seen in the overall classification accuracy of all the AE variants. The results of the AE variants are verified and validated in Table 1. Table 1 also shows that starting for MNIST basic subset until MNIST random background digits. This decrease is responsible for the gradual increase in the complexity and noisiness in the datasets. Other than this complexity, the CAE outperformed both SAE and DAE in terms of the class level accuracy.
We first introduced the conventional AE model and its most popular three variants, namely, SAE, DAE, and CAE. Notably, the AE is a three-layered neural-network model with one input layer having
No potential conflict of interest relevant to this article was reported.
The authors would like to thank the Ministry of Higher Education, Malaysia, and Universiti Tun Hussein Onn Malaysia for financially supporting this Research under Trans-disciplinary Research Grant vote no. T003.
Table 1. Performance evaluation of SAE, DAE, and CAE based on MNIST benchmark datasets.
Datasets | basic | bg-rand | bg-img-rot | ||||||
---|---|---|---|---|---|---|---|---|---|
Approach | Execution time | Training error | Test error | Execution time | Training error | Test error | Execution time | Training error | Test error |
SAE + Softmax | 24m 25s | 5.59 | 13.54 | 31m 25s | 8.92 | 21.54 | 40m 25s | 11.87 | 26.58 |
DAE + Softmax | 31m 20s | 7.73 | 15.78 | 37m 20s | 10.38 | 21.78 | 45m20s | 17.73 | 28.72 |
CAE + Softmax | 28m 55s | 3.35 | 11.65 | 32m 55s | 7.58 | 20.65 | 37m 55s | 11.05 | 24.78 |
Algorithm 1. Auto-encoder.
- No. of hidden layers: - Input feature set: [ - Encoding-activation function: - Decoding-activation function: - Inputs weights: - Biasness values: |
- Compute encoded inputs - Compute biased inputs - Compute |
- Compute decoded outputs - Compute biased outputs - Compute |
- Optimize the value of Eq. (5). |
International Journal of Fuzzy Logic and Intelligent Systems 2020; 20(1): 8-16
Published online March 25, 2020 https://doi.org/10.5391/IJFIS.2020.20.1.8
Copyright © The Korean Institute of Intelligent Systems.
Muhammad Aamir1, Nazri Mohd Nawi2, Hairulnizam Bin Mahdin1, Rashid Naseem3, and Muhammad Zulqarnain1
1Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu Pahat, Malaysia
2Soft Computing and Data Mining Center, Universiti Tun Hussein Onn Malaysia, Batu Pahat, Malaysia
3Department of IT and Computer Science, Pak-Austria Fachhochschule Institute of Applied Sciences and Technology, Haripur, Pakistan
Correspondence to:Muhammad Aamir (amirr.khan1@gmail.com)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Auto-encoders (AEs) have been proposed for solving many problems in the domain of machine learning and deep learning since the last few decades. Due to their satisfactory performance, their multiple variations have also recently appeared. First, we introduce the conventional AE model and its different variant for learning abstract features from data by using a contrastive divergence algorithm. Second, we present the major differences among the following three popular AE variants: sparse AE (SAE), denoising AE (DAE), and contractive AE (CAE). Third, the main contribution of this study is performing the comparative study of the aforementioned three AE variants on the basis of their mathematical modeling and experiments. All the variants of the standard AE are evaluated on the basis of the MNIST benchmark handwritten digit dataset for classification problem. The observed output reveals the benefit of using the AE model and its variants. From the experiments, it is concluded that CAE achieved better classification accuracy than those of SAE and DAE.
Keywords: Sparse auto-encoder (SAE), Denoising auto-encoder (DAE), Contractive auto-encoder (CAE), MNIST, Classification
Auto-encoders (AEs) are unsupervised neural networks that apply the back-propagation behavior by setting up the high-dimensional input feature set to a low-dimensional output feature set and then recovering the original feature set from the output. The reduction procedure of high-dimensional data to low-dimensional data is known as encoding, while the reconstruction of the original data from the low-dimensional data is called decoding. AE was proposed to improve the reconstruction reliability of low-dimensional feature sets. Notably, feature-set selection is the most important step in solving big-data problems and processing complex data with high-dimensional attributes [1]. In addition, feature-selection is a valuable method for performing classification and prediction [2, 3]. It is a satisfactory procedure to split the useful and effective features from ineffective and useless ones within the feature representation space. However, if irrelevant raw features are provided as input, feature selection might fail [4, 5]. Using AEs is one of the most efficient approaches for feature reduction, as it results in satisfactory classification. It traditionally works in two phases, namely, encoding and decoding. The process of converting the input features to a new representation is called encoding, whereas that of converting this new representation back to the original inputs is called decoding. The primary aim of AEs is to identify more useful and informative features from a large amount of data without applying any feature-reduction technique [6–8].
To improve the efficiency of the standard AE, [9] proposed an AE-based learning approach with two-stage architecture. They stacked the AE by considering both intra- and inter-modal semantics and named it sparse AE (SAE). Similarly, [10] proposed a feature-learning approach by stacking the contractive AE (CAE) and naming it sCAE. Their research work focused on extracting the temporal feature difference of superpixel and noise suppression. To benefit from both the characteristics of AE and properties of conventional algorithms, [11, 12] merged the CAE with the convolutional neural network to form a deep architecture named DSCNN, to solve the problem of both the presence of speckle noise and absence of effective features in single-polarized SAR (synthetic aperture radar) data.
Notable, some popular AE variants are SAEs [13], denoising AEs (DAEs) [14], and CAEs [15]. In this study, we first introduce AE and then present its three popular variants, namely, SAEs, DAEs, and CAEs. We conducted some experiments to empirically compare their performances and properties with one another. A comprehensive analysis based on the experimental results is also given. Conclusion is presented in the last section.
AEs are also called auto-associators. In the domain of machine learning, AEs comprise a complicated neural network with three layers and have been investigated by a many researchers since the 1990s [16]. The entire processing inside an AE proceeds in two phases, namely encoding and decoding, as depicted in Figure 1.
The process of mapping the input feature set to transform it to give its intermediate representation to the hidden layer is called encoding, which is given as follows:
where
The process of mapping the output of the hidden layer back into the input feature set is called decoding, which is given as follows:
where
The terms
The primary aim of the reconstruction is to generate the outputs that are similar to the original inputs to the maximum possible extent, by reducing the reconstruction error. The reconstruction layer uses the following parameter set to reconstruct the original inputs:
Let us assume we have the input feature set as
While
where
where the relative importance of regularization is controlled via weight coefficient decay
SAEs are some of the simplest AE variants that contain a sparsity term, which leads a network that comprises a single layer to learn a code dictionary that reduces both the reconstruction error and number of code-words necessary for reconstruction. The main purpose of adding a sparse penalty term to the SAE cost function is to limit the activation value at the average of the hidden-layer neuron [17]. Extracting sparse features from raw data is the main aim of SAEs. To achieve the sparsity of objects presentation, two methods can be used. One method is to penalize the hidden unit bias, and the other one is to directly penalize the hidden unit’s activation output, which has been discussed by [18]. Notably, [19] used sparse multi-layered AE framework for performing the auto-detection of nuclei on a set of 537 marked histopathological breast-cancer images. The entire architecture comprised one input layer, two hidden layers, and one output layer. They stacked the SAE and used Softmax for performing classification. In addition, 3,468 nodes served as input to the input layer, 400 nodes in the first hidden layer, and 255 nodes in the second hidden layer. The output from the second hidden layer was an input to the final Softmax layer, which was mapped to two classes, either 1 or 0. Generally, if the output value of a neuron is 1, it is indicated that the neuron is inactive. However, if the output value of a neuron is 1, it is indicated that the neuron is active. The aim of implementing sparsity is to bind the unwanted activation and the updated forms of
where
DAE, which are the modest alteration of standard AEs, are not trained for reconstructing their input but to denoise an artificially corrupted version of their input [20]. Whereas an over-complete regular AE can easily learn a useless identity mapping, a DAE must extract more useful features to solve the considerably more difficult denoising problem. The main role of DAEs is to reconstruct noisy data from the original data. DAEs are used for the following two purposes: encoding the noisy data and recovering the original input data from the reconstructed output data. When data encoders are stacked in different layers, they form stacked DAEs [21, 22]. Notably, [23] adopted a weighted reconstruction loss function to the conventional DAE for performing noise classification in a speech-enhancement system. They stacked several weighted DAEs to construct the model. In their experiments, they performed 50 steps with the number of input nodes ranging from 50 to 100. The un-noisy data comprised 8 languages, and a white noise with the signal-to-noise ratios (SNRs) of 6, 12, and 18 dB was selected from the NTT database to train the model. The model was trained on the data of length 1 hour, and it was tested on the data of length 8 minutes. In the literature, DAEs have been the center of interest for many researchers in different contexts, as per [12, 13]. Based on AEs and
A training input
CAEs, which are the popular and effective variants of AEs, are based on the unsupervised-learning approach for the production of valuable feature representations [24]. The trained models that learned feature representations using CAE are highly robust to minor noises and small changes within the training data. CAEs are considered the extended forms of the DAEs in which contractive penalty is added to the error function of reconstruction [25]. This penalty is, in turn, used to penalize the attribute sensitivity to the variations in the inputs. Based on learning a robust feature set, [26] proposed a CAE with an unconventional regularization yielding objective function. Based on
where
We performed several experiments to evaluate and then compare the feature-learning abilities of the three AEs variants, namely, SAE, DAE, and CAE. To conduct fair experiments and comparison, all the AE variants were assumed to have the same architecture with one input layer, two hidden layers, and a Softmax regression classifier in the last layer for final classification. In this study, all the experiments were conducted using an Intel core i3 CPU with 4 GB of RAM and Windows 8.1 operating system. The compiler and language used for developing and testing these algorithms was Python2.7. For fast implementation of the comparative approaches, an efficient numeric computational open-source library TensorFlow [27] was used. In our experiments, 12,000 images were considered for evaluation and comparison with some comparative models. We used 5,000 random images for validating each MNIST variation dataset. All the images comprised 0–9 digits with the size of 28 x 28 pixels. A few example images, which show the shape deformation and variation in digits appearance, are depicted in Figure 2. This variation makes it a challenging dataset to be used to test the models. The variation datasets of MNIST are standard (basic), random-noise background digits (bg-rand), and rotation and image background digits (bg-img-rot) [28]. The first MNIST basic (mnist-basic) comprises the standard MNIST image without any changes. In the bg-rand variant dataset, a background is randomly added to the digit image. The value of the background for every pixel is consistently produced in the fixed range of 0 to 255. In the last dataset, the variation factor of rotating digits and addition of random background is merged in a single dataset. In addition, six random images from each MNIST variant dataset are depicted in Figures 2, 3, and 4, respectively. All the comparative approaches are used in combination with features extractors, and the Softmax classifier is fitted in the last layer for performing final classification. In the experiments using each approach, we randomly selected sample images for half of the training and half of the testing for each run. Every instance of model running is independently repeated 10 times, and the mean value of the errors is reported in the results.
In experiments, the different AE variants leveraged a Softmax classifier to estimate the overall classification behavior of models based on MNIST benchmark variant datasets. Figures 5, 6, and 7 depict the results for SAE on three MNIST benchmark variant datasets. Similarly, Figures 8, 9, and 10 depict the results for DAE. Furthermore, Figures 11, 12, and 13 depict the output roc curve for CAE. Upon analyzing all the experiments, as depicted in these figures, a gradual decrease is seen in the overall classification accuracy of all the AE variants. The results of the AE variants are verified and validated in Table 1. Table 1 also shows that starting for MNIST basic subset until MNIST random background digits. This decrease is responsible for the gradual increase in the complexity and noisiness in the datasets. Other than this complexity, the CAE outperformed both SAE and DAE in terms of the class level accuracy.
We first introduced the conventional AE model and its most popular three variants, namely, SAE, DAE, and CAE. Notably, the AE is a three-layered neural-network model with one input layer having
No potential conflict of interest relevant to this article was reported.
The authors would like to thank the Ministry of Higher Education, Malaysia, and Universiti Tun Hussein Onn Malaysia for financially supporting this Research under Trans-disciplinary Research Grant vote no. T003.
Architecture of the standard AE.
MNIST small subset (basic).
MNIST random-noise background digits (bg-rand).
MNIST rotation and image background digits (bg-img-rot).
ROC for the SAE based on a small subset (basic).
ROC for the SAE based on random noise background digits (bg-rand).
ROC for the SAE based on rotation and image background digits (bg-img-rot).
ROC for the DAE based on MNIST small subset (basic).
ROC for the DAE based on random noise background digits (bg-rand).
ROC for the DAE based on rotation and image background digits (bg-img-rot).
ROC for the CAE based on a small subset (basic).
ROC for the CAE based on random noise background digits (bg-rand).
ROC for the CAE based on rotation and image background digits (bg-img-rot).
Table 1 . Performance evaluation of SAE, DAE, and CAE based on MNIST benchmark datasets.
Datasets | basic | bg-rand | bg-img-rot | ||||||
---|---|---|---|---|---|---|---|---|---|
Approach | Execution time | Training error | Test error | Execution time | Training error | Test error | Execution time | Training error | Test error |
SAE + Softmax | 24m 25s | 5.59 | 13.54 | 31m 25s | 8.92 | 21.54 | 40m 25s | 11.87 | 26.58 |
DAE + Softmax | 31m 20s | 7.73 | 15.78 | 37m 20s | 10.38 | 21.78 | 45m20s | 17.73 | 28.72 |
CAE + Softmax | 28m 55s | 3.35 | 11.65 | 32m 55s | 7.58 | 20.65 | 37m 55s | 11.05 | 24.78 |
Algorithm 1. Auto-encoder.
- No. of hidden layers: - Input feature set: [ - Encoding-activation function: - Decoding-activation function: - Inputs weights: - Biasness values: |
- Compute encoded inputs - Compute biased inputs - Compute |
- Compute decoded outputs - Compute biased outputs - Compute |
- Optimize the value of Eq. (5). |
Nishant Chauhan and Byung-Jae Choi
International Journal of Fuzzy Logic and Intelligent Systems 2021; 21(4): 349-357 https://doi.org/10.5391/IJFIS.2021.21.4.349Jihad Anwar Qadir, Abdulbasit K. Al-Talabani, and Hiwa A. Aziz
International Journal of Fuzzy Logic and Intelligent Systems 2020; 20(4): 272-277 https://doi.org/10.5391/IJFIS.2020.20.4.272Hansoo Lee, Jonggeun Kim, and Sungshin Kim
Int. J. Fuzzy Log. Intell. Syst. 2017; 17(4): 229-234 https://doi.org/10.5391/IJFIS.2017.17.4.229Architecture of the standard AE.
|@|~(^,^)~|@|MNIST small subset (basic).
|@|~(^,^)~|@|MNIST random-noise background digits (bg-rand).
|@|~(^,^)~|@|MNIST rotation and image background digits (bg-img-rot).
|@|~(^,^)~|@|ROC for the SAE based on a small subset (basic).
|@|~(^,^)~|@|ROC for the SAE based on random noise background digits (bg-rand).
|@|~(^,^)~|@|ROC for the SAE based on rotation and image background digits (bg-img-rot).
|@|~(^,^)~|@|ROC for the DAE based on MNIST small subset (basic).
|@|~(^,^)~|@|ROC for the DAE based on random noise background digits (bg-rand).
|@|~(^,^)~|@|ROC for the DAE based on rotation and image background digits (bg-img-rot).
|@|~(^,^)~|@|ROC for the CAE based on a small subset (basic).
|@|~(^,^)~|@|ROC for the CAE based on random noise background digits (bg-rand).
|@|~(^,^)~|@|ROC for the CAE based on rotation and image background digits (bg-img-rot).