International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(4): 317-332
Published online December 25, 2024
https://doi.org/10.5391/IJFIS.2023.24.4.317
© The Korean Institute of Intelligent Systems
Seoung-Ho Choi
Faculty of Basic Liberal Art, College of Liberal Arts, Hansung University, Seoul, Korea
Correspondence to :
Seoung-Ho Choi (jcn99250@naver.com)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current deep learning models underperform when using loss functions characterized by single properties. Therefore, optimizing these models with a combination of multiple attributes is essential to enhance performance. We propose a novel hybrid nonlinear loss function technique incorporating heterogeneous nonlinear properties to achieve optimal performance. We evaluated the proposed method using six analytical techniques: contour map visualization, loss error frequency analysis, scatter plot visualization, loss function visualization, gradient descent analysis, and covariate Analysis with convex and linear coefficients. Our experiments on semantic segmentation tasks using the Pascal visual object classes and automatic target recognition datasets demonstrated superior optimization performance compared to existing loss functions.
Keywords: Hybrid nonlinear loss function, Heterogeneous properties, Optimization, High performance
A loss function is employed to mitigate overfitting during optimization and training. Various types of loss functions exist, including the L1 loss function [1], L2 loss function [1], nonlinear exponential loss function, and exponential moving average loss function.
The exponential moving average is a type of loss function designed to capture the dynamic properties of a model by optimizing dynamic information. Notably, nonlinear properties can vary dynamically. However, the nonlinear exponential loss function ensures stable training of the model by continuously reflecting these nonlinear fixed properties.
Existing loss functions have been investigated to structure single properties and enhance model performance effectively. However, training models with single properties do not achieve optimal performance compared to their optimum state. Although models trained with single properties consistently reach a particular value , their performance remains suboptimal. Moreover, optimizing models with single properties is unlikely to achieve the best performance, as the reliability of such performance is limited.
Existing loss functions have been examined to structure single properties and enhance model performance effectively. However, training deep learning models with single properties do not achieve optimal performance compared to their ideal state. Models trained with single properties tend to accumulated consistent values, reaching certain thresholds. Moreover, such models are unlikely to achieve peak performance, as their reliability remains limited. To optimize large and complex models, combinatorial optimization methods are necessary. Given that deep learning models are large and complex nonlinear models systems, combinatorial optimization can yield higher performance than traditional single optimization techniques. Therefore, combining various properties is essential to achieve the optimal state of the deep learning model. The contributions of our study are as follows.
· We propose a novel hybrid loss function technique that achieves optimal model performance by incorporating heterogeneous nonlinear properties.
· We evaluate the effectiveness of the proposed method using three analytical techniques: contour map visualization, loss error frequency analysis, and scatter plot visualization of the loss error map.
· We conduct a relational analysis to evaluate the impact of the proposed technique on the loss function. We examine the relationships between the loss function and existing loss functions using the activation function.
· We employ various methods to influence changes in covariates of the loss function on model optimization. Additionally, we analyze the relationship between these covariates using convex and linear coefficients.
· We utilized gradient descent analysis to demonstrate that unique combinations of loss and activation functions enhance the learning performance of deep learning models.
· Using 3D surface plots and heatmap visualizations of the formulas, we visually demonstrated that different combinations of loss and activation functions improve the stability and responsiveness of the deep learning model during optimization.
· We initialized seed values for five classes to examine the impact of loss function properties on model training.
· We confirmed the importance of reflecting significant properties when considering heterogeneous nonlinear properties.
Loss functions have evolved to address various learning problems, Liu et al. [2] developed an algorithm to propose a new loss function by exploring variations within existing loss functions using a search space comprising 21 mathematical operators, three constant inputs, and three variable inputs. The algorithm efficiently identified optimal combinations of loss functions, and experiments conducted on the common object in context (COCO), VOC2017, and Berkeley deep drive (BDD) datasets demonstrated that these new combinations outperformed existing ones.
Wen et al. [3] introduced a novel loss function designed to mitigate performance degradation due to insufficient sample diversity without relying on sampling procedures. This approach incorporates adjustable parameters to widen the loss range, diminish the impact of easily classified samples, and replace traditional sampling functions. Derived from the cross-entropy loss, the proposed formula integrates the adjustable parameters
Xie et al. [4] addressed the challenge of accurate image segmentation of surface defects in convolution neural network (CNN)-based models by proposing a balanced loss function. This study identified loss imbalance as a critical issue affecting segmentation accuracy, including label imbalance, easy-to-hard example imbalance, and boundary imbalance. The proposed balanced loss function introduced dynamic class weights, truncated cross-entropy loss, and strategies to suppress label confusion, significantly enhancing segmentation accuracy across various benchmarks and CNN segmentation models. Experimental findings consistently showed performance improvements ranging from 5% to 30% over commonly used loss functions.
Dickson et al. [5] evaluated and compared various loss functions for training artificial neural networks (ANNs). This study analyzed a hybrid loss function that combines entropy and squared error, using the squared error for initial bootstrap network training and entropy loss for subsequent refinement. The results indicate that this approach leverages the initial optimal state achieved by the squared error while the entropy loss provides further enhancements. This strategy offers a novel method for improving the training efficiency of ANNs, contributing to advancements in neural network-based models.
Yu and Wu [6] introduced a hybrid loss function, SSIM+L1 integrating the structural similarity index measure (SSIM) and L1 norms. Their study incorporated an attention mechanism that effectively leverages global information during network training. When applied to a CNN for reconstructing missing traces, experiments on synthetic and field data demonstrated that the SSIM+L1 loss function, combined with the attention mechanism, significantly improved interpolation results compared to networks lacking attention.
In medical imaging, researchers have tailored hybrid loss functions for encoder-decoder convolutional neural networks to address underutilization and image information loss [7]. This study proposed a low-dose computed tomography (LDCT) image-denoising network that employs residual multiscale feature extraction and a hybrid loss function comprising mean square error (MSE), SSIM, and perceptual losses. Experimental results demonstrated that this approach enhanced image quality and improved computational efficiency compared to state-of-the-art techniques, offering promising advancements in medical imaging applications.
Deep-learning techniques have been employed to enhance LDCT image quality using a hybrid loss function that integrates adversarial, perceptual, sharpness, and structural similarity losses [8]. Experimental results demonstrated that the SS-WGAN with the hybrid loss function outperformed traditional denoising methods in preserving image details and structure, which are crucial for accurate medical diagnosis.
Figure 1 visualizes the proposed method. Figure 1(a) illustrates hybrid loss function case 1, and Figure 1(b) depicts hybrid loss function case 2. The proposed method is detailed as follows.
We test this function using two cases: a combination of nonlinear chrematistic and linear properties and a combination of nonlinear fixed and nonlinear dynamic properties.
We optimize deep learning models using the loss function of single properties, which limits our scope. We identified the cause of this limitation and aim to address it through heterogeneous properties. Although this might not be the optimal point for a deep learning model, expanding the range by reflecting various properties is necessary to find the optimal point. To verify this, we propose a novel hybrid loss function that incorporates these properties. We intend to validate the rationale for incorporating heterogeneous properties using three analyses.
Figure 2 visualizes the employed loss function methods using the formula
Based on these results, the hyperplane space defined by the loss function may assume a narrow valley shape. This shape allows the deep learning model to converge rapidly and reach the optimal point.
Reflecting nonlinear dynamic properties within such systems proves to be an efficient approach for deep learning model optimization.
This experiment was applied to U-Net [12] and FCN [13] for semantic segmentation. We conducted experiments on the proposed method using Keras in an Ubuntu environment, recording the average value over five iterations. Experiments with FCN and U-Net were conducted using visual object classes (VOC) and automatic target recognition (ATR) datasets for the image segmentation task, employing cross-entropy loss. Results were evaluated using dice similarity coefficient (DSC), F1-score, Intersection over Union (IOU), loss, precision, and recall. The DSC measures the similarity of two images to evaluate the accuracy of the segmented image. The F1-score, the harmonic mean of precision and recall, evaluates binary classification model performance. IOU measures the overlap between predicted and actual segmented regions, evaluating segmentation accuracy. Loss indicates the difference between the prediction and actual value of the deep learning model. Precision shows the percentage of correctly predicted positive classes, while recall indicates the percentage of true-positive classes correctly identified by the deep learning model. We analyzed the experiment results using five seed values: 1, 250, 500, 777, and 999. The hybrid loss function, where A and B denote the covariates, is presented below.
To analyze the relationship between the covariates of the loss function, two analyses were performed: convex and linear coefficients. Deep learning models typically exhibit hyperplanes with convex properties. During optimization, the learning and learned position oscillate. This analysis determines whether the loss function should possess convex properties or whether the deep learning model should incorporate linear properties to enhance optimization.
Another analysis was conducted using five seed values with a loss function. As the deep learning model was trained, the initial distribution varied according to the seed value. Finally, we aimed to confirm the stability of the training by analyzing the loss function based on the initial state of the model.
To understand the influence of the hyperplane on the deep learning model, we employed a contour map. This approach visualizes the impact on the hyperplane shape on the deep learning model. If the model is acquired stably, it indicates that a broad area is achieved without complex contour patterns.
Additionally, a frequency visualization analysis using loss errors was performed. Frequency analysis is challenging when the results for real and fake images are highly similar. Therefore, similar data were selected by analyzing the frequency of occurrence. The loss error can be used to analyze the state of the deep learning model by indicating a specific pattern of error convergence or emission. Both frequency and spread analyses were conducted.
Moreover, we conducted a spread analysis using the loss error. If the loss is concentrated at a single value, the expressed points cluster and appear more robustly compressed. This confirms that the deep learning model exhibits consistent errors, with similar errors recurring.
The equation used in the experiment is as follows:
The combination of an exponential moving average with the L1 and L2 loss functions is termed a hybrid loss function.
The combination of an exponential moving average with a nonlinear exponential loss function is termed a hybrid v2 loss function.
Table 1 presents a summary of the index utilized in the experiment. The experiment involved 27 distinct loss function equations. The rationale for the experiment design is as follows: we investigate the impact of two terms on the optimization of the deep learning model by analyzing the coefficients of the loss functions. This analysis employs both linear and convex coefficients for model optimization. Additionally, we examine the trade-off relationship between these two terms and its effect on the performance of the deep learning model.
The conventional method involves a straightforward linear combination of the L1 and L2 loss functions. In contrast, the nonlinear exponential loss function is designed to capture nonlinear properties rather than the characteristics of linear combinations in the learning process of the deep learning model. Certain attributes are consistently reflected within the deep learning model. The exponential moving average loss function dynamically integrates information, including historical data, in the loss function.
By introducing a loss function that embodies fixed properties alongside one that captures dynamic properties, we developed and evaluated a heterogeneous loss function encompassing both characteristics.
These two methods validated the impact of heterogeneous feature loss functions on deep-learning model optimization and performance.
We investigated the impact of two heterogeneous feature loss functions on deep learning model loss. This analysis aimed to enhance model learning by selectively filtering significant properties, as not all properties are effectively learned by the model. The ReLU function, combined with linear filtering, was applied to eliminate values below zero and linearly reflect values above zero. This approach leverages the vanishing property of the ReLU function to improve model performance.
Additionally, node filtering is applied to reflect only sparse information, optimizing the learning process of the deep learning model by emphasizing meaningful information and eliminating redundancy.
Figure 4 presents the contour lines analysis, where the generated values are evaluated using the formula sin10(
Figure 4 illustrates the frequency of value occurrence as contour lines for various experimental conditions: (a) the existing experimental equation; (b) the experimental formula with L1 and L2 loss function addition; (c) the experimental formula with the nonlinear exponential loss function; (d) the experimental formula with the nonlinear exponential and linear combination L1 and L2 loss functions; (e) the experimental formula with the exponential moving average loss function, and (f) the experimental formula with both the exponential moving average and nonlinear exponential loss functions.
Figure 4 demonstrates that when the loss function is applied, numerous zero values are generated compared to previous instances. In Figure 4(b), non-zero values cluster in the upper left and lower right, indicating a phenomenon that maximizes the zero margin as non-zero values spread to the peripheries. Additionally, Figure 4(c), 4(d), and 4(f) display contour lines that may present local minima within the hyperplane space of the neural network model. When zero values occurs frequently, expanding the range of these local minima(indicated by the contour line) or identifying an optimizable range increases the information the deep learning model can learn, leading to improved optimization. As shown in the results, this process progresses to the branch.
Figure 4 presents an analysis of balanced and unbalanced data generation. For balanced data, values near zero are more clustered than the contour lines, suggesting multiple local minima within the hyperplane of a neural network. Additionally, when both positive and negative values are generated, the contour lines appears as straight lines indicating that extreme values form a more defined boundary compared to a soft boundary.
The analysis using the frequency of occurrence is depicted in Figure 5, where each loss function method was applied to the cross-entropy loss. For unbalanced values,
Finally, the results from the experimental formula were validated by analyzing specific outcomes. This confirmed that the deep learning model continuously learned from a uniform set of values. Therefore, past research should have focused on optimizing generalization by incorporating a variety of values rather than continuously relying on specific ones.
The analysis employing the LogisticGroupLasso model is depicted in Figure 6, which presents the outcomes derived from this model. The x-axis denotes noisy probabilities, while the y-axis represents noise-free probabilities. The resultant distribution exhibits a convex function shape. Moreover, the degree of clustering varied across different loss function methods. Notably, disparate loss function methods produced non-clustered values at varying frequencies, where such non-clustered values could be identified as outliers during the training process of the deep learning model. These outliers potentially lead to suboptimal learning pathways. Hence, meticulously designing the loss function to mitigate the occurrence of sparse values is crucial.
Figures 7 and 8 illustrate the equations utilized in the experiment through 3D surface plots and heat maps. In Figures 7(a) and 8(a) depict a smooth transition between L1 and L2 losses, signifying a well-behaved loss function that smoothly interpolates between L1 and L2 losses based on the value of A. This smooth transition is crucial for optimization, as it mitigates abrupt changes in the loss landscape. Figures 7(b) and 8(b) demonstrate a linear relationship and exhibits a weighted combination of L1 and L2 losses, providing flexibility in adjusting the influence of each component. Figures 7(c) and 8(c) reveal exponential growth, indicating high sensitivity to changes in L2, which can generate significant gradients beneficial for escaping flat regions in optimization, though it also poses a risk of instability. Figures 7(d) and 8(d) display dynamic behavior with a mix of linear terms, allowing for adaptive changes in the loss function and enhancing optimization by focusing on different components at various stages. Figures 7(e) and 8(e) resemble Figures 7(d) and 8(d) but with convex coefficients. The application of convex coefficients aims to ensure stability and smoother transitions in the loss landscape, which is essential for consistent optimization. Figures 7(f)–7(i) and 8(f)–8(i) depict the equations employed in hybrid loss functions. Figure 8(f) illustrates a more complex landscape due to the hybrid nature and exponential terms, capturing more intricate data patterns. Figure 8(g) shows both linear combinations and exponential growth, leveraging the strengths of both terms for balanced stability and responsiveness. Figure 8(h) and 8(i) pertain to ReLU and Bi-eLU equations, respectively. Figure 8(h) exhibits nonlinearity and sparsity, aiding in handling nonlinearity and ensuring sparsity in gradients, thus facilitating efficient optimization. Figure 8(i) demonstrates the activation of Bi-eLU, revealing complex nonlinear behavior that helps capture more intricate data patterns, thereby providing a more nuanced optimization landscape.
Visualizing the effect of the nonlinear exponential loss function weight for each loss function revealed distinct decision boundary. The shape of these decision boundaries supports stable optimization in deep learning models. Our findings indicate that a hybrid loss function, combining both fixed and dynamic properties, can slightly alleviate optimization challenges. However, when employing hybrid properties, the model exhibited correlated properties that aided in optimization. Conversely, performance of the model deteriorated when the heterogeneous properties were ineffective in learning. Experimental results confirm the importance of identifying an adaptive loss function suitable for the deep learning model.
The experimental results of the first component, the hybrid loss function, are as follows: The standard deviation is illustrated on a bar graph in Figure 9 to depict the dispersion degree of each method. The loss exhibited the most stable error performance when convex coefficients were utilized in hybrid v2, reflecting both nonlinear static and nonlinear dynamic information. Regarding deep learning model performance, hybrid v1, which combines the exponential moving average with L1 and L2 loss functions, demonstrated the best performance. This indicates that the deep learning model was trained stably and achieved high performance by incorporating hybrid properties. The reflection of nonlinear dynamic and static properties confirmed that the model could learn most stably and perform better compared to existing loss function methods.
Consequently, the deep learning model achieved stable learning by incorporating nonlinear dynamic and static properties. Our findings confirmed that reflecting nonlinear dynamics and linear fixed properties led to superior performance compared to existing loss function methods.
Table 2 presents the experimental performance of the proposed method. The results in Table 2 indicate that the hybrid loss function enhanced the segmentation performance of the deep learning model by 0.1%. Regarding loss reduction, the best experiment demonstrated an error decrease of 3% or more compared to the previous one. The hybrid loss function v2 exhibited the lowest loss, attributed to the nonlinear fixed and dynamic properties achieved through bi-directional nonlinear filtering. This suggests that the nonlinear properties were effectively incorporated into the model learning process within a consistent nonlinear environment.
Analysis of the relationship between loss and the loss function in Table 2 confirmed that the loss value is slightly higher when simple addition is employed. This addition suggests that the complexity of information increases through superposition, resulting in a higher loss value. However, utilizing ReLU, eLU, Bi-ReLU, and Bi-eLU to reflect the degree of the loss function shows different effects. ReLU filtering reflects linear properties, reducing the loss compared to conventional addition. Filtering with eLU demonstrates that nonlinear properties are more stably reflected than linear properties. Bi-activation was then applied to learn by extracting essential properties through bipolar information reflection. The Bi-ReLU linear filtering effect shows a learning loss value similar to a single nonlinear filtering results. Furthermore, it was confirmed that Bi-eLU presents a simple linear effect when the nonlinear filtering effect is inadequate in both directions but shows a lower error value when essential properties are identified through nonlinear filtering.
Additionally, the performance of the loss function according to the change in seed value was analyzed. The analysis indicated that the performance of the loss function varied depending on the initial seed occurrence, confirming the necessity for the deep learning model to establish a stable initial state. Studying a robust loss function that performs well even in this initial state is essential. Hybrid test results showed better outcomes when dynamically reflected properties were effectively mixed, though not in all cases. To further develop the hybrid loss function, identifying an optimal combination of fixed and dynamic properties is necessary. For Bi-eLU, the deletion of shared information and nonlinear sparse information reflected in deep learning model optimization was confirmed. The hybrid loss function found a lower error and slightly improved performance in the seed value experiment. Seed test results indicated that performance varied significantly depending on the initial state of the model, underscoring the need for robust loss function studies in the initial model state.
In Figure 9, the bar graph visualizes the mean values from the five-seed value experiments. The highest IOU test performance and the lowest learning error rate were achieved by incorporating nonlinear static and dynamic properties. The application of the activation function demonstrates a performance change as significant features are integrated into the model. When eLU and Bi-eLU, which reflect nonlinear properties, are applied, heterogeneous nonlinear features exhibit low error rates and maximum IOU performance in the deep learning model.
As shown in Table 2, similar performance metrics were observed across the various experiments of the U-Net model. Metrics such as DSC, F1-score, and IOU were consistent across experiments, indicating that the experimental conditions and hyperparameter settings were comparable. However, there were some variations in loss values between experiments. In the initial experiments, loss values were above 1.0 but gradually decreased to below 1.0 in later experiments. This trend indicates a reduction in loss as the deep learning model is trained. The decrease in loss values was accompanied by improvements in accuracy metrics such as DSC, F1-score, and IOU. Thus, as the deep learning model training progressed, loss decreased and segmentation performance improved, which can be attributed to the progress of the model training.
The experimental results summarized the performance of various loss function as follows: no existing loss function < linear combination of L1 and L2 loss functions < exponential moving average to nonlinear exponential loss function < heterogeneous nonlinear loss function. It was confirmed that incorporating nonlinear properties into deep learning models enhances learning and test performance.
We proposed a novel hybrid nonlinear loss function technique to encapsulate the semantic fixed and dynamic properties of a deep learning model during training. The effectiveness of the proposed method was demonstrated using five methods of analysis. Further analysis examined the relationship between the proposed loss function and existing loss functions, considering the reflection of linear and nonlinear properties on the covariate coefficients of loss functions. Visualization and gradient descent analysis of the applied loss function were conducted, along with an assessment of the effect of seed value on hyperparameter optimization. Consequently, the results indicate that the proposed technique facilitated reliable convergence and optimization of the deep learning model, fully reflecting static and dynamic characteristics. Analysis of the relationship between the proposed and existing loss functions that only meaningful information must be filtered. Additionally, seed value analysis confirmed a slight increase in deviation; however, the proposed technique demonstrated greater robustness to changes due to small standard deviations compared to existing methods.
In future studies, the energy functions of the loss function will be defined to visualize their effects on model training. Additionally, visualization methods should be employed to identify optimal points by illustrating possible relationships within the model. The learning direction should also be optimized to filter and output only meaningful information, visualizing the complexity between information derived from loss and optimization areas.
No potential conflict of interest relevant to this article was reported.
This study was conducted with support from Hansung University.
No potential conflict of interest relevant to this article was reported.
Visualization of used loss function: (a) no loss function, (b) L1 and L2 loss functions, (c) nonlinear exponential loss function, (d) nonlinear exponential loss function and L1/L2 loss functions, (e) exponential moving average loss function, and (f) exponential moving average loss function and nonlinear loss function.
Visualization of the analysis of the proposed method using contour maps of 6 different loss functions with balanced and unbalanced input data: (a) existing experimental equation, (b) L1 and L2 loss function addition formulas, (c) nonlinear exponential loss function, (d) nonlinear exponential loss function and linear combination L1 & L2 loss functions, (e) exponential moving average loss function, and (f) exponential moving average loss function and nonlinear exponential loss function.
Visualization of the impact of 6 different loss functions on the frequency distribution of generated values with unbalanced and balanced input data: (a) existing experimental equation, (b) L1 and L2 loss function addition formulas, (c) nonlinear exponential loss function, (d) nonlinear exponential loss function and linear combination L1 & L2 loss functions, (e) exponential moving average loss function, and (f) exponential moving average loss function and nonlinear exponential loss function
Visualization of scatter plots using LogisticGroupLasso model to analyze the model and sparsity values of 6 different loss functions: (a) existing experimental equation, (b) L1 and L2 loss function addition formulas, (c) nonlinear exponential loss function, (d) nonlinear exponential loss function and linear combination L1 & L2 loss functions, (e) exponential moving average loss function, and (f) exponential moving average loss function and nonlinear exponential loss function.
Visualization of 3D surface for 9 equations: (a)
Visualization of heatmap for 9 equations: (a)
Visualization of comparison of the loss function methods: (a) loss values for different training epochs and (b) IOU values for different training epochs.
Table 1. The index definition of hybrid loss function.
Name | Index A |
---|---|
None | Experiment 1 |
Combination L1 and L2 loss function with convex coefficient | Experiment 2 |
Combination L1 and L2 loss function with linear coefficient | Experiment 3 |
Nonlinear exponential loss function with convex coefficient | Experiment 4 |
Nonlinear exponential loss function with linear coefficient | Experiment 5 |
Exponential hybrid loss function with convex coefficient | Experiment 6 |
Exponential hybrid loss function with linear coefficient | Experiment 7 |
Exponential moving average loss function with convex coefficient adopted no filter | Experiment 8 |
Exponential moving average loss function with convex coefficient adopted ReLU | Experiment 9 |
Exponential moving average loss function with convex coefficient adopted eLU | Experiment 10 |
Exponential moving average loss function with convex coefficient adopted Bi-ReLU | Experiment 11 |
Exponential moving average loss function with convex coefficient adopted Bi-eLU | Experiment 12 |
Exponential moving average loss function with linear coefficient adopted no filter | Experiment 13 |
Exponential moving average loss function with linear coefficient adopted ReLU | Experiment 14 |
Exponential moving average loss function with linear coefficient adopted eLU | Experiment 15 |
Exponential moving average loss function with linear coefficient adopted Bi-ReLU | Experiment 16 |
Exponential moving average loss function with linear coefficient adopted Bi-eLU | Experiment 17 |
Exponential hybrid v2 loss function with convex coefficient adopted no filter | Experiment 18 |
Exponential hybrid v2 loss function with convex coefficient adopted ReLU | Experiment 19 |
Exponential hybrid v2 loss function with convex coefficient adopted eLU | Experiment 20 |
Exponential hybrid v2 loss function with convex coefficient adopted Bi-ReLU | Experiment 21 |
Exponential hybrid v2 loss function with convex coefficient adopted Bi-eLU | Experiment 22 |
Exponential hybrid v2 loss function with linear coefficient adopted no filter | Experiment 23 |
Exponential hybrid v2 loss function with linear coefficient adopted ReLU | Experiment 24 |
Exponential hybrid v2 loss function with linear coefficient adopted eLU | Experiment 25 |
Exponential hybrid v2 loss function with linear coefficient adopted Bi-ReLU | Experiment 26 |
Exponential hybrid v2 loss function with linear coefficient adopted Bi-eLU | Experiment 27 |
Table 2. Experiment result of loss function using U-Net on ATR dataset with Seed 250.
Loss function | DSC | F1-score | IOU | Loss | Precision | Recall |
---|---|---|---|---|---|---|
Experiment1 | 0.724 | 0.724 | 0.768 | 1.072 | 0.724 | 0.724 |
Experiment2 | 0.723 | 0.723 | 0.767 | 1.525 | 0.723 | 0.723 |
Experiment3 | 0.724 | 0.724 | 0.768 | 1.398 | 0.724 | 0.724 |
Experiment4 | 0.724 | 0.724 | 0.768 | 1.364 | 0.724 | 0.724 |
Experiment5 | 0.724 | 0.724 | 0.767 | 1.287 | 0.724 | 0.724 |
Experiment6 | 0.723 | 0.723 | 0.767 | 1.222 | 0.723 | 0.723 |
Experiment7 | 0.724 | 0.724 | 0.768 | 1.278 | 0.724 | 0.724 |
Experiment8 | 0.724 | 0.724 | 0.768 | 1.255 | 0.724 | 0.724 |
Experiment9 | 0.724 | 0.724 | 0.768 | 1.228 | 0.724 | 0.724 |
Experiment10 | 0.724 | 0.724 | 0.768 | 1.237 | 0.724 | 0.724 |
Experiment11 | 0.724 | 0.724 | 0.768 | 1.195 | 0.724 | 0.724 |
Experiment12 | 0.724 | 0.724 | 0.768 | 1.164 | 0.724 | 0.724 |
Experiment13 | 0.724 | 0.724 | 0.768 | 1.136 | 0.724 | 0.724 |
Experiment14 | 0.724 | 0.724 | 0.768 | 1.112 | 0.724 | 0.724 |
Experiment15 | 0.724 | 0.724 | 0.768 | 1.115 | 0.724 | 0.724 |
Experiment16 | 0.724 | 0.724 | 0.768 | 1.095 | 0.724 | 0.724 |
Experiment17 | 0.724 | 0.724 | 0.768 | 1.08 | 0.724 | 0.724 |
Experiment18 | 0.725 | 0.724 | 0.768 | 1.075 | 0.724 | 0.724 |
Experiment19 | 0.725 | 0.725 | 0.768 | 1.09 | 0.725 | 0.725 |
Experiment20 | 0.725 | 0.725 | 0.768 | 1.077 | 0.725 | 0.725 |
Experiment21 | 0.725 | 0.725 | 0.768 | 1.069 | 0.725 | 0.725 |
Experiment22 | 0.725 | 0.725 | 0.768 | 1.07 | 0.725 | 0.725 |
Experiment23 | 0.725 | 0.725 | 0.768 | 1.077 | 0.725 | 0.725 |
Experiment24 | 0.725 | 0.725 | 0.768 | 1.07 | 0.725 | 0.725 |
Experiment25 | 0.725 | 0.725 | 0.768 | 1.052 | 0.725 | 0.725 |
Experiment26 | 0.725 | 0.725 | 0.768 | 1.044 | 0.725 | 0.725 |
Experiment27 | 0.725 | 0.725 | 0.768 | 1.042 | 0.725 | 0.725 |
Algorithm 1. Hybrid loss function case 1.
|
+ |
+ |
|
Algorithm 2. Hybrid loss function case 2.
|
+ |
+ |
|
International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(4): 317-332
Published online December 25, 2024 https://doi.org/10.5391/IJFIS.2023.24.4.317
Copyright © The Korean Institute of Intelligent Systems.
Seoung-Ho Choi
Faculty of Basic Liberal Art, College of Liberal Arts, Hansung University, Seoul, Korea
Correspondence to:Seoung-Ho Choi (jcn99250@naver.com)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current deep learning models underperform when using loss functions characterized by single properties. Therefore, optimizing these models with a combination of multiple attributes is essential to enhance performance. We propose a novel hybrid nonlinear loss function technique incorporating heterogeneous nonlinear properties to achieve optimal performance. We evaluated the proposed method using six analytical techniques: contour map visualization, loss error frequency analysis, scatter plot visualization, loss function visualization, gradient descent analysis, and covariate Analysis with convex and linear coefficients. Our experiments on semantic segmentation tasks using the Pascal visual object classes and automatic target recognition datasets demonstrated superior optimization performance compared to existing loss functions.
Keywords: Hybrid nonlinear loss function, Heterogeneous properties, Optimization, High performance
A loss function is employed to mitigate overfitting during optimization and training. Various types of loss functions exist, including the L1 loss function [1], L2 loss function [1], nonlinear exponential loss function, and exponential moving average loss function.
The exponential moving average is a type of loss function designed to capture the dynamic properties of a model by optimizing dynamic information. Notably, nonlinear properties can vary dynamically. However, the nonlinear exponential loss function ensures stable training of the model by continuously reflecting these nonlinear fixed properties.
Existing loss functions have been investigated to structure single properties and enhance model performance effectively. However, training models with single properties do not achieve optimal performance compared to their optimum state. Although models trained with single properties consistently reach a particular value , their performance remains suboptimal. Moreover, optimizing models with single properties is unlikely to achieve the best performance, as the reliability of such performance is limited.
Existing loss functions have been examined to structure single properties and enhance model performance effectively. However, training deep learning models with single properties do not achieve optimal performance compared to their ideal state. Models trained with single properties tend to accumulated consistent values, reaching certain thresholds. Moreover, such models are unlikely to achieve peak performance, as their reliability remains limited. To optimize large and complex models, combinatorial optimization methods are necessary. Given that deep learning models are large and complex nonlinear models systems, combinatorial optimization can yield higher performance than traditional single optimization techniques. Therefore, combining various properties is essential to achieve the optimal state of the deep learning model. The contributions of our study are as follows.
· We propose a novel hybrid loss function technique that achieves optimal model performance by incorporating heterogeneous nonlinear properties.
· We evaluate the effectiveness of the proposed method using three analytical techniques: contour map visualization, loss error frequency analysis, and scatter plot visualization of the loss error map.
· We conduct a relational analysis to evaluate the impact of the proposed technique on the loss function. We examine the relationships between the loss function and existing loss functions using the activation function.
· We employ various methods to influence changes in covariates of the loss function on model optimization. Additionally, we analyze the relationship between these covariates using convex and linear coefficients.
· We utilized gradient descent analysis to demonstrate that unique combinations of loss and activation functions enhance the learning performance of deep learning models.
· Using 3D surface plots and heatmap visualizations of the formulas, we visually demonstrated that different combinations of loss and activation functions improve the stability and responsiveness of the deep learning model during optimization.
· We initialized seed values for five classes to examine the impact of loss function properties on model training.
· We confirmed the importance of reflecting significant properties when considering heterogeneous nonlinear properties.
Loss functions have evolved to address various learning problems, Liu et al. [2] developed an algorithm to propose a new loss function by exploring variations within existing loss functions using a search space comprising 21 mathematical operators, three constant inputs, and three variable inputs. The algorithm efficiently identified optimal combinations of loss functions, and experiments conducted on the common object in context (COCO), VOC2017, and Berkeley deep drive (BDD) datasets demonstrated that these new combinations outperformed existing ones.
Wen et al. [3] introduced a novel loss function designed to mitigate performance degradation due to insufficient sample diversity without relying on sampling procedures. This approach incorporates adjustable parameters to widen the loss range, diminish the impact of easily classified samples, and replace traditional sampling functions. Derived from the cross-entropy loss, the proposed formula integrates the adjustable parameters
Xie et al. [4] addressed the challenge of accurate image segmentation of surface defects in convolution neural network (CNN)-based models by proposing a balanced loss function. This study identified loss imbalance as a critical issue affecting segmentation accuracy, including label imbalance, easy-to-hard example imbalance, and boundary imbalance. The proposed balanced loss function introduced dynamic class weights, truncated cross-entropy loss, and strategies to suppress label confusion, significantly enhancing segmentation accuracy across various benchmarks and CNN segmentation models. Experimental findings consistently showed performance improvements ranging from 5% to 30% over commonly used loss functions.
Dickson et al. [5] evaluated and compared various loss functions for training artificial neural networks (ANNs). This study analyzed a hybrid loss function that combines entropy and squared error, using the squared error for initial bootstrap network training and entropy loss for subsequent refinement. The results indicate that this approach leverages the initial optimal state achieved by the squared error while the entropy loss provides further enhancements. This strategy offers a novel method for improving the training efficiency of ANNs, contributing to advancements in neural network-based models.
Yu and Wu [6] introduced a hybrid loss function, SSIM+L1 integrating the structural similarity index measure (SSIM) and L1 norms. Their study incorporated an attention mechanism that effectively leverages global information during network training. When applied to a CNN for reconstructing missing traces, experiments on synthetic and field data demonstrated that the SSIM+L1 loss function, combined with the attention mechanism, significantly improved interpolation results compared to networks lacking attention.
In medical imaging, researchers have tailored hybrid loss functions for encoder-decoder convolutional neural networks to address underutilization and image information loss [7]. This study proposed a low-dose computed tomography (LDCT) image-denoising network that employs residual multiscale feature extraction and a hybrid loss function comprising mean square error (MSE), SSIM, and perceptual losses. Experimental results demonstrated that this approach enhanced image quality and improved computational efficiency compared to state-of-the-art techniques, offering promising advancements in medical imaging applications.
Deep-learning techniques have been employed to enhance LDCT image quality using a hybrid loss function that integrates adversarial, perceptual, sharpness, and structural similarity losses [8]. Experimental results demonstrated that the SS-WGAN with the hybrid loss function outperformed traditional denoising methods in preserving image details and structure, which are crucial for accurate medical diagnosis.
Figure 1 visualizes the proposed method. Figure 1(a) illustrates hybrid loss function case 1, and Figure 1(b) depicts hybrid loss function case 2. The proposed method is detailed as follows.
We test this function using two cases: a combination of nonlinear chrematistic and linear properties and a combination of nonlinear fixed and nonlinear dynamic properties.
We optimize deep learning models using the loss function of single properties, which limits our scope. We identified the cause of this limitation and aim to address it through heterogeneous properties. Although this might not be the optimal point for a deep learning model, expanding the range by reflecting various properties is necessary to find the optimal point. To verify this, we propose a novel hybrid loss function that incorporates these properties. We intend to validate the rationale for incorporating heterogeneous properties using three analyses.
Figure 2 visualizes the employed loss function methods using the formula
Based on these results, the hyperplane space defined by the loss function may assume a narrow valley shape. This shape allows the deep learning model to converge rapidly and reach the optimal point.
Reflecting nonlinear dynamic properties within such systems proves to be an efficient approach for deep learning model optimization.
This experiment was applied to U-Net [12] and FCN [13] for semantic segmentation. We conducted experiments on the proposed method using Keras in an Ubuntu environment, recording the average value over five iterations. Experiments with FCN and U-Net were conducted using visual object classes (VOC) and automatic target recognition (ATR) datasets for the image segmentation task, employing cross-entropy loss. Results were evaluated using dice similarity coefficient (DSC), F1-score, Intersection over Union (IOU), loss, precision, and recall. The DSC measures the similarity of two images to evaluate the accuracy of the segmented image. The F1-score, the harmonic mean of precision and recall, evaluates binary classification model performance. IOU measures the overlap between predicted and actual segmented regions, evaluating segmentation accuracy. Loss indicates the difference between the prediction and actual value of the deep learning model. Precision shows the percentage of correctly predicted positive classes, while recall indicates the percentage of true-positive classes correctly identified by the deep learning model. We analyzed the experiment results using five seed values: 1, 250, 500, 777, and 999. The hybrid loss function, where A and B denote the covariates, is presented below.
To analyze the relationship between the covariates of the loss function, two analyses were performed: convex and linear coefficients. Deep learning models typically exhibit hyperplanes with convex properties. During optimization, the learning and learned position oscillate. This analysis determines whether the loss function should possess convex properties or whether the deep learning model should incorporate linear properties to enhance optimization.
Another analysis was conducted using five seed values with a loss function. As the deep learning model was trained, the initial distribution varied according to the seed value. Finally, we aimed to confirm the stability of the training by analyzing the loss function based on the initial state of the model.
To understand the influence of the hyperplane on the deep learning model, we employed a contour map. This approach visualizes the impact on the hyperplane shape on the deep learning model. If the model is acquired stably, it indicates that a broad area is achieved without complex contour patterns.
Additionally, a frequency visualization analysis using loss errors was performed. Frequency analysis is challenging when the results for real and fake images are highly similar. Therefore, similar data were selected by analyzing the frequency of occurrence. The loss error can be used to analyze the state of the deep learning model by indicating a specific pattern of error convergence or emission. Both frequency and spread analyses were conducted.
Moreover, we conducted a spread analysis using the loss error. If the loss is concentrated at a single value, the expressed points cluster and appear more robustly compressed. This confirms that the deep learning model exhibits consistent errors, with similar errors recurring.
The equation used in the experiment is as follows:
The combination of an exponential moving average with the L1 and L2 loss functions is termed a hybrid loss function.
The combination of an exponential moving average with a nonlinear exponential loss function is termed a hybrid v2 loss function.
Table 1 presents a summary of the index utilized in the experiment. The experiment involved 27 distinct loss function equations. The rationale for the experiment design is as follows: we investigate the impact of two terms on the optimization of the deep learning model by analyzing the coefficients of the loss functions. This analysis employs both linear and convex coefficients for model optimization. Additionally, we examine the trade-off relationship between these two terms and its effect on the performance of the deep learning model.
The conventional method involves a straightforward linear combination of the L1 and L2 loss functions. In contrast, the nonlinear exponential loss function is designed to capture nonlinear properties rather than the characteristics of linear combinations in the learning process of the deep learning model. Certain attributes are consistently reflected within the deep learning model. The exponential moving average loss function dynamically integrates information, including historical data, in the loss function.
By introducing a loss function that embodies fixed properties alongside one that captures dynamic properties, we developed and evaluated a heterogeneous loss function encompassing both characteristics.
These two methods validated the impact of heterogeneous feature loss functions on deep-learning model optimization and performance.
We investigated the impact of two heterogeneous feature loss functions on deep learning model loss. This analysis aimed to enhance model learning by selectively filtering significant properties, as not all properties are effectively learned by the model. The ReLU function, combined with linear filtering, was applied to eliminate values below zero and linearly reflect values above zero. This approach leverages the vanishing property of the ReLU function to improve model performance.
Additionally, node filtering is applied to reflect only sparse information, optimizing the learning process of the deep learning model by emphasizing meaningful information and eliminating redundancy.
Figure 4 presents the contour lines analysis, where the generated values are evaluated using the formula sin10(
Figure 4 illustrates the frequency of value occurrence as contour lines for various experimental conditions: (a) the existing experimental equation; (b) the experimental formula with L1 and L2 loss function addition; (c) the experimental formula with the nonlinear exponential loss function; (d) the experimental formula with the nonlinear exponential and linear combination L1 and L2 loss functions; (e) the experimental formula with the exponential moving average loss function, and (f) the experimental formula with both the exponential moving average and nonlinear exponential loss functions.
Figure 4 demonstrates that when the loss function is applied, numerous zero values are generated compared to previous instances. In Figure 4(b), non-zero values cluster in the upper left and lower right, indicating a phenomenon that maximizes the zero margin as non-zero values spread to the peripheries. Additionally, Figure 4(c), 4(d), and 4(f) display contour lines that may present local minima within the hyperplane space of the neural network model. When zero values occurs frequently, expanding the range of these local minima(indicated by the contour line) or identifying an optimizable range increases the information the deep learning model can learn, leading to improved optimization. As shown in the results, this process progresses to the branch.
Figure 4 presents an analysis of balanced and unbalanced data generation. For balanced data, values near zero are more clustered than the contour lines, suggesting multiple local minima within the hyperplane of a neural network. Additionally, when both positive and negative values are generated, the contour lines appears as straight lines indicating that extreme values form a more defined boundary compared to a soft boundary.
The analysis using the frequency of occurrence is depicted in Figure 5, where each loss function method was applied to the cross-entropy loss. For unbalanced values,
Finally, the results from the experimental formula were validated by analyzing specific outcomes. This confirmed that the deep learning model continuously learned from a uniform set of values. Therefore, past research should have focused on optimizing generalization by incorporating a variety of values rather than continuously relying on specific ones.
The analysis employing the LogisticGroupLasso model is depicted in Figure 6, which presents the outcomes derived from this model. The x-axis denotes noisy probabilities, while the y-axis represents noise-free probabilities. The resultant distribution exhibits a convex function shape. Moreover, the degree of clustering varied across different loss function methods. Notably, disparate loss function methods produced non-clustered values at varying frequencies, where such non-clustered values could be identified as outliers during the training process of the deep learning model. These outliers potentially lead to suboptimal learning pathways. Hence, meticulously designing the loss function to mitigate the occurrence of sparse values is crucial.
Figures 7 and 8 illustrate the equations utilized in the experiment through 3D surface plots and heat maps. In Figures 7(a) and 8(a) depict a smooth transition between L1 and L2 losses, signifying a well-behaved loss function that smoothly interpolates between L1 and L2 losses based on the value of A. This smooth transition is crucial for optimization, as it mitigates abrupt changes in the loss landscape. Figures 7(b) and 8(b) demonstrate a linear relationship and exhibits a weighted combination of L1 and L2 losses, providing flexibility in adjusting the influence of each component. Figures 7(c) and 8(c) reveal exponential growth, indicating high sensitivity to changes in L2, which can generate significant gradients beneficial for escaping flat regions in optimization, though it also poses a risk of instability. Figures 7(d) and 8(d) display dynamic behavior with a mix of linear terms, allowing for adaptive changes in the loss function and enhancing optimization by focusing on different components at various stages. Figures 7(e) and 8(e) resemble Figures 7(d) and 8(d) but with convex coefficients. The application of convex coefficients aims to ensure stability and smoother transitions in the loss landscape, which is essential for consistent optimization. Figures 7(f)–7(i) and 8(f)–8(i) depict the equations employed in hybrid loss functions. Figure 8(f) illustrates a more complex landscape due to the hybrid nature and exponential terms, capturing more intricate data patterns. Figure 8(g) shows both linear combinations and exponential growth, leveraging the strengths of both terms for balanced stability and responsiveness. Figure 8(h) and 8(i) pertain to ReLU and Bi-eLU equations, respectively. Figure 8(h) exhibits nonlinearity and sparsity, aiding in handling nonlinearity and ensuring sparsity in gradients, thus facilitating efficient optimization. Figure 8(i) demonstrates the activation of Bi-eLU, revealing complex nonlinear behavior that helps capture more intricate data patterns, thereby providing a more nuanced optimization landscape.
Visualizing the effect of the nonlinear exponential loss function weight for each loss function revealed distinct decision boundary. The shape of these decision boundaries supports stable optimization in deep learning models. Our findings indicate that a hybrid loss function, combining both fixed and dynamic properties, can slightly alleviate optimization challenges. However, when employing hybrid properties, the model exhibited correlated properties that aided in optimization. Conversely, performance of the model deteriorated when the heterogeneous properties were ineffective in learning. Experimental results confirm the importance of identifying an adaptive loss function suitable for the deep learning model.
The experimental results of the first component, the hybrid loss function, are as follows: The standard deviation is illustrated on a bar graph in Figure 9 to depict the dispersion degree of each method. The loss exhibited the most stable error performance when convex coefficients were utilized in hybrid v2, reflecting both nonlinear static and nonlinear dynamic information. Regarding deep learning model performance, hybrid v1, which combines the exponential moving average with L1 and L2 loss functions, demonstrated the best performance. This indicates that the deep learning model was trained stably and achieved high performance by incorporating hybrid properties. The reflection of nonlinear dynamic and static properties confirmed that the model could learn most stably and perform better compared to existing loss function methods.
Consequently, the deep learning model achieved stable learning by incorporating nonlinear dynamic and static properties. Our findings confirmed that reflecting nonlinear dynamics and linear fixed properties led to superior performance compared to existing loss function methods.
Table 2 presents the experimental performance of the proposed method. The results in Table 2 indicate that the hybrid loss function enhanced the segmentation performance of the deep learning model by 0.1%. Regarding loss reduction, the best experiment demonstrated an error decrease of 3% or more compared to the previous one. The hybrid loss function v2 exhibited the lowest loss, attributed to the nonlinear fixed and dynamic properties achieved through bi-directional nonlinear filtering. This suggests that the nonlinear properties were effectively incorporated into the model learning process within a consistent nonlinear environment.
Analysis of the relationship between loss and the loss function in Table 2 confirmed that the loss value is slightly higher when simple addition is employed. This addition suggests that the complexity of information increases through superposition, resulting in a higher loss value. However, utilizing ReLU, eLU, Bi-ReLU, and Bi-eLU to reflect the degree of the loss function shows different effects. ReLU filtering reflects linear properties, reducing the loss compared to conventional addition. Filtering with eLU demonstrates that nonlinear properties are more stably reflected than linear properties. Bi-activation was then applied to learn by extracting essential properties through bipolar information reflection. The Bi-ReLU linear filtering effect shows a learning loss value similar to a single nonlinear filtering results. Furthermore, it was confirmed that Bi-eLU presents a simple linear effect when the nonlinear filtering effect is inadequate in both directions but shows a lower error value when essential properties are identified through nonlinear filtering.
Additionally, the performance of the loss function according to the change in seed value was analyzed. The analysis indicated that the performance of the loss function varied depending on the initial seed occurrence, confirming the necessity for the deep learning model to establish a stable initial state. Studying a robust loss function that performs well even in this initial state is essential. Hybrid test results showed better outcomes when dynamically reflected properties were effectively mixed, though not in all cases. To further develop the hybrid loss function, identifying an optimal combination of fixed and dynamic properties is necessary. For Bi-eLU, the deletion of shared information and nonlinear sparse information reflected in deep learning model optimization was confirmed. The hybrid loss function found a lower error and slightly improved performance in the seed value experiment. Seed test results indicated that performance varied significantly depending on the initial state of the model, underscoring the need for robust loss function studies in the initial model state.
In Figure 9, the bar graph visualizes the mean values from the five-seed value experiments. The highest IOU test performance and the lowest learning error rate were achieved by incorporating nonlinear static and dynamic properties. The application of the activation function demonstrates a performance change as significant features are integrated into the model. When eLU and Bi-eLU, which reflect nonlinear properties, are applied, heterogeneous nonlinear features exhibit low error rates and maximum IOU performance in the deep learning model.
As shown in Table 2, similar performance metrics were observed across the various experiments of the U-Net model. Metrics such as DSC, F1-score, and IOU were consistent across experiments, indicating that the experimental conditions and hyperparameter settings were comparable. However, there were some variations in loss values between experiments. In the initial experiments, loss values were above 1.0 but gradually decreased to below 1.0 in later experiments. This trend indicates a reduction in loss as the deep learning model is trained. The decrease in loss values was accompanied by improvements in accuracy metrics such as DSC, F1-score, and IOU. Thus, as the deep learning model training progressed, loss decreased and segmentation performance improved, which can be attributed to the progress of the model training.
The experimental results summarized the performance of various loss function as follows: no existing loss function < linear combination of L1 and L2 loss functions < exponential moving average to nonlinear exponential loss function < heterogeneous nonlinear loss function. It was confirmed that incorporating nonlinear properties into deep learning models enhances learning and test performance.
We proposed a novel hybrid nonlinear loss function technique to encapsulate the semantic fixed and dynamic properties of a deep learning model during training. The effectiveness of the proposed method was demonstrated using five methods of analysis. Further analysis examined the relationship between the proposed loss function and existing loss functions, considering the reflection of linear and nonlinear properties on the covariate coefficients of loss functions. Visualization and gradient descent analysis of the applied loss function were conducted, along with an assessment of the effect of seed value on hyperparameter optimization. Consequently, the results indicate that the proposed technique facilitated reliable convergence and optimization of the deep learning model, fully reflecting static and dynamic characteristics. Analysis of the relationship between the proposed and existing loss functions that only meaningful information must be filtered. Additionally, seed value analysis confirmed a slight increase in deviation; however, the proposed technique demonstrated greater robustness to changes due to small standard deviations compared to existing methods.
In future studies, the energy functions of the loss function will be defined to visualize their effects on model training. Additionally, visualization methods should be employed to identify optimal points by illustrating possible relationships within the model. The learning direction should also be optimized to filter and output only meaningful information, visualizing the complexity between information derived from loss and optimization areas.
No potential conflict of interest relevant to this article was reported.
This study was conducted with support from Hansung University.
Proposed method: (a) hybrid loss function case 1 and (b) hybrid loss function case 2.
Visualization of used loss function: (a) no loss function, (b) L1 and L2 loss functions, (c) nonlinear exponential loss function, (d) nonlinear exponential loss function and L1/L2 loss functions, (e) exponential moving average loss function, and (f) exponential moving average loss function and nonlinear loss function.
Visualization of experiment method configuration.
Visualization of the analysis of the proposed method using contour maps of 6 different loss functions with balanced and unbalanced input data: (a) existing experimental equation, (b) L1 and L2 loss function addition formulas, (c) nonlinear exponential loss function, (d) nonlinear exponential loss function and linear combination L1 & L2 loss functions, (e) exponential moving average loss function, and (f) exponential moving average loss function and nonlinear exponential loss function.
Visualization of the impact of 6 different loss functions on the frequency distribution of generated values with unbalanced and balanced input data: (a) existing experimental equation, (b) L1 and L2 loss function addition formulas, (c) nonlinear exponential loss function, (d) nonlinear exponential loss function and linear combination L1 & L2 loss functions, (e) exponential moving average loss function, and (f) exponential moving average loss function and nonlinear exponential loss function
Visualization of scatter plots using LogisticGroupLasso model to analyze the model and sparsity values of 6 different loss functions: (a) existing experimental equation, (b) L1 and L2 loss function addition formulas, (c) nonlinear exponential loss function, (d) nonlinear exponential loss function and linear combination L1 & L2 loss functions, (e) exponential moving average loss function, and (f) exponential moving average loss function and nonlinear exponential loss function.
Visualization of 3D surface for 9 equations: (a)
Visualization of heatmap for 9 equations: (a)
Visualization of comparison of the loss function methods: (a) loss values for different training epochs and (b) IOU values for different training epochs.
Table 1 . The index definition of hybrid loss function.
Name | Index A |
---|---|
None | Experiment 1 |
Combination L1 and L2 loss function with convex coefficient | Experiment 2 |
Combination L1 and L2 loss function with linear coefficient | Experiment 3 |
Nonlinear exponential loss function with convex coefficient | Experiment 4 |
Nonlinear exponential loss function with linear coefficient | Experiment 5 |
Exponential hybrid loss function with convex coefficient | Experiment 6 |
Exponential hybrid loss function with linear coefficient | Experiment 7 |
Exponential moving average loss function with convex coefficient adopted no filter | Experiment 8 |
Exponential moving average loss function with convex coefficient adopted ReLU | Experiment 9 |
Exponential moving average loss function with convex coefficient adopted eLU | Experiment 10 |
Exponential moving average loss function with convex coefficient adopted Bi-ReLU | Experiment 11 |
Exponential moving average loss function with convex coefficient adopted Bi-eLU | Experiment 12 |
Exponential moving average loss function with linear coefficient adopted no filter | Experiment 13 |
Exponential moving average loss function with linear coefficient adopted ReLU | Experiment 14 |
Exponential moving average loss function with linear coefficient adopted eLU | Experiment 15 |
Exponential moving average loss function with linear coefficient adopted Bi-ReLU | Experiment 16 |
Exponential moving average loss function with linear coefficient adopted Bi-eLU | Experiment 17 |
Exponential hybrid v2 loss function with convex coefficient adopted no filter | Experiment 18 |
Exponential hybrid v2 loss function with convex coefficient adopted ReLU | Experiment 19 |
Exponential hybrid v2 loss function with convex coefficient adopted eLU | Experiment 20 |
Exponential hybrid v2 loss function with convex coefficient adopted Bi-ReLU | Experiment 21 |
Exponential hybrid v2 loss function with convex coefficient adopted Bi-eLU | Experiment 22 |
Exponential hybrid v2 loss function with linear coefficient adopted no filter | Experiment 23 |
Exponential hybrid v2 loss function with linear coefficient adopted ReLU | Experiment 24 |
Exponential hybrid v2 loss function with linear coefficient adopted eLU | Experiment 25 |
Exponential hybrid v2 loss function with linear coefficient adopted Bi-ReLU | Experiment 26 |
Exponential hybrid v2 loss function with linear coefficient adopted Bi-eLU | Experiment 27 |
Table 2 . Experiment result of loss function using U-Net on ATR dataset with Seed 250.
Loss function | DSC | F1-score | IOU | Loss | Precision | Recall |
---|---|---|---|---|---|---|
Experiment1 | 0.724 | 0.724 | 0.768 | 1.072 | 0.724 | 0.724 |
Experiment2 | 0.723 | 0.723 | 0.767 | 1.525 | 0.723 | 0.723 |
Experiment3 | 0.724 | 0.724 | 0.768 | 1.398 | 0.724 | 0.724 |
Experiment4 | 0.724 | 0.724 | 0.768 | 1.364 | 0.724 | 0.724 |
Experiment5 | 0.724 | 0.724 | 0.767 | 1.287 | 0.724 | 0.724 |
Experiment6 | 0.723 | 0.723 | 0.767 | 1.222 | 0.723 | 0.723 |
Experiment7 | 0.724 | 0.724 | 0.768 | 1.278 | 0.724 | 0.724 |
Experiment8 | 0.724 | 0.724 | 0.768 | 1.255 | 0.724 | 0.724 |
Experiment9 | 0.724 | 0.724 | 0.768 | 1.228 | 0.724 | 0.724 |
Experiment10 | 0.724 | 0.724 | 0.768 | 1.237 | 0.724 | 0.724 |
Experiment11 | 0.724 | 0.724 | 0.768 | 1.195 | 0.724 | 0.724 |
Experiment12 | 0.724 | 0.724 | 0.768 | 1.164 | 0.724 | 0.724 |
Experiment13 | 0.724 | 0.724 | 0.768 | 1.136 | 0.724 | 0.724 |
Experiment14 | 0.724 | 0.724 | 0.768 | 1.112 | 0.724 | 0.724 |
Experiment15 | 0.724 | 0.724 | 0.768 | 1.115 | 0.724 | 0.724 |
Experiment16 | 0.724 | 0.724 | 0.768 | 1.095 | 0.724 | 0.724 |
Experiment17 | 0.724 | 0.724 | 0.768 | 1.08 | 0.724 | 0.724 |
Experiment18 | 0.725 | 0.724 | 0.768 | 1.075 | 0.724 | 0.724 |
Experiment19 | 0.725 | 0.725 | 0.768 | 1.09 | 0.725 | 0.725 |
Experiment20 | 0.725 | 0.725 | 0.768 | 1.077 | 0.725 | 0.725 |
Experiment21 | 0.725 | 0.725 | 0.768 | 1.069 | 0.725 | 0.725 |
Experiment22 | 0.725 | 0.725 | 0.768 | 1.07 | 0.725 | 0.725 |
Experiment23 | 0.725 | 0.725 | 0.768 | 1.077 | 0.725 | 0.725 |
Experiment24 | 0.725 | 0.725 | 0.768 | 1.07 | 0.725 | 0.725 |
Experiment25 | 0.725 | 0.725 | 0.768 | 1.052 | 0.725 | 0.725 |
Experiment26 | 0.725 | 0.725 | 0.768 | 1.044 | 0.725 | 0.725 |
Experiment27 | 0.725 | 0.725 | 0.768 | 1.042 | 0.725 | 0.725 |
Algorithm 1. Hybrid loss function case 1.
|
+ |
+ |
|
Algorithm 2. Hybrid loss function case 2.
|
+ |
+ |
|
Le Van Hoa and Vo Viet Minh Nhat
International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(3): 181-193 https://doi.org/10.5391/IJFIS.2024.24.3.181Minyoung Kim
Int. J. Fuzzy Log. Intell. Syst. 2017; 17(1): 10-16 https://doi.org/10.5391/IJFIS.2017.17.1.10Minyoung Kim
Int. J. Fuzzy Log. Intell. Syst. 2016; 16(4): 293-298 https://doi.org/10.5391/IJFIS.2016.16.4.293Proposed method: (a) hybrid loss function case 1 and (b) hybrid loss function case 2.
|@|~(^,^)~|@|Visualization of used loss function: (a) no loss function, (b) L1 and L2 loss functions, (c) nonlinear exponential loss function, (d) nonlinear exponential loss function and L1/L2 loss functions, (e) exponential moving average loss function, and (f) exponential moving average loss function and nonlinear loss function.
|@|~(^,^)~|@|Visualization of experiment method configuration.
|@|~(^,^)~|@|Visualization of the analysis of the proposed method using contour maps of 6 different loss functions with balanced and unbalanced input data: (a) existing experimental equation, (b) L1 and L2 loss function addition formulas, (c) nonlinear exponential loss function, (d) nonlinear exponential loss function and linear combination L1 & L2 loss functions, (e) exponential moving average loss function, and (f) exponential moving average loss function and nonlinear exponential loss function.
|@|~(^,^)~|@|Visualization of the impact of 6 different loss functions on the frequency distribution of generated values with unbalanced and balanced input data: (a) existing experimental equation, (b) L1 and L2 loss function addition formulas, (c) nonlinear exponential loss function, (d) nonlinear exponential loss function and linear combination L1 & L2 loss functions, (e) exponential moving average loss function, and (f) exponential moving average loss function and nonlinear exponential loss function
|@|~(^,^)~|@|Visualization of scatter plots using LogisticGroupLasso model to analyze the model and sparsity values of 6 different loss functions: (a) existing experimental equation, (b) L1 and L2 loss function addition formulas, (c) nonlinear exponential loss function, (d) nonlinear exponential loss function and linear combination L1 & L2 loss functions, (e) exponential moving average loss function, and (f) exponential moving average loss function and nonlinear exponential loss function.
|@|~(^,^)~|@|Visualization of 3D surface for 9 equations: (a)
Visualization of heatmap for 9 equations: (a)
Visualization of comparison of the loss function methods: (a) loss values for different training epochs and (b) IOU values for different training epochs.