Int. J. Fuzzy Log. Intell. Syst. 2017; 17(1): 10-16
Published online March 31, 2017
https://doi.org/10.5391/IJFIS.2017.17.1.10
© The Korean Institute of Intelligent Systems
Minyoung Kim
Department of Electronics & IT Media Engineering, Seoul National University of Science & Technology, Seoul, Korea
Correspondence to :
Minyoung Kim (mikim@seoultech.ac.kr)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
The kernel function plays a central role in modern pattern classification for its ability to capture the inherent affinity structure of the underlying data manifold. While the kernel function can be chosen by human experts with domain knowledge, it is often more principled and promising to learn it directly from data. This idea of kernel learning has been studied considerably in machine learning and pattern recognition. However, most kernel learning algorithms assume fully supervised setups requiring expensive class label annotation for the training data. In this paper we consider kernel learning in the semi-supervised setup where only a fraction of data points need to be labeled. We propose two approaches: the first extends the idea of label propagation along the data similarity graph, in which we simultaneously learn the kernel and impute the labels of the unlabeled data. The second aims to minimize the dual loss in the support vector machines (SVM) classifier learning with respect to the kernel parameters and the missing labels. We provide reasonable and effective approximate solution methods for these optimization problems. These approaches exploit both labeled and unlabeled data in kernel leaning, where we empirically demonstrate the effectiveness on several benchmark datasets with partially labeled learning setups.
Keywords: Kernel learning, Semi-supervised learning, Pattern classification, Optimization
Pattern classification is a central topic in recent machine learning and pattern recognition, where the goal is to learn an accurate class prediction function given a finite amount of labeled training examples. The prediction function not only explains the training data faithfully, but also needs to be generalizable to unseen data points. The kernel machines have been believed to be one of the most successful pattern recognition approaches for their ability to incorporate potentially nonlinear affinity structures in data through the kernel functions. The support vector machines (SVM) [1] and the Gaussian processes [2] are two most popular algorithms/frameworks based on the kernel functions.
How to choose the kernel function is crucial to success in classification. If human experts or domain knowledge are available, one can exploit them to derive a good kernel function. However, resorting to human efforts is costly, and for certain applications the domain knowledge is inherently unavailable. The kernel learning is a principled solution in such situations where it directly learns the kernel function from data. The kernel learning has received significant attention in machine learning and pattern recognition communities, and there is a line of important previous work that has shown the effectiveness of the kernel learning in theory and practice [3–11].
However, most kernel learning methods typically aim to minimize the label prediction loss for a classifier built on the kernel function. In other words, most kernel learning algorithms assume supervised learning setups in which the data points are fully annotated with class labels. Labeling a large number of data points is expensive requiring human efforts, and it is even prohibited when the data size increases drastically. In this paper we deal with kernel learning in partially labeled data setups where only a small portion of data points require to be labeled, and the rest unlabeled data are also exploited to learn a more accurate kernel. That is, we deal with the kernel learning under semi-supervised learning scenarios.
Specifically we propose two approaches to tackle the problem. The first approach extends the idea of the manifold regularization [12] that aims to impute the missing labels using the label propagation along the data similarity manifold formed by the underlying kernel function. Whilst the kernel is assumed given and fixed in this previous work, we extend it for kernel learning formulation. The second method aims to minimize the dual optimum of the SVM classifier where we pose a mini-max optimization that effectively finds the best kernel that yields the smallest SVM hinge loss. In both formulations, we simultaneously optimize the kernel parameters and the missing labels as optimization variables. Although these proposed optimization problems are difficult to solve, we introduce efficient approximate solution methods that can find viable solutions.
The rest of the paper is organized as follows. After briefly reviewing some background on kernel machines and kernel learning, we describe our kernel learning approaches in semi-supervised data setups in Section 3. Empirical evaluations on some benchmark datasets in Section 4 demonstrate the effectiveness of the proposed algorithms, significantly outperforming existing methods that exploit the labeled data only.
We begin with a brief review of kernel machines for pattern classification problems in machine learning. In the standard binary fully-supervised learning setup, one is given labeled training data points
A typical approach is to consider a linear function family
In (
It can be shown that (e.g., the representer theorem [14]) the optimal
The kernel function, as it is the inner product in the feature space, naturally captures similarity between input data points. In kernel machines one can represent arbitrary non-linear affinity structures in data manifold through the kernel functions, without the effort of learning the complex input feature mapping explicitly. This is the major benefit of the kernel machines including SVM and Gaussian processes. For computational simplicity, one typically opts for certain parametric kernel forms. The radial basis function (RBF) kernel, defined as
Whereas the kernel function or the data affinity structure can be selected from human experts, it is expensive, and often times the domain information is not available at all. A more reasonable treatment is to form a parametric family of the kernel functions, and learn the kernel parameters from data. This idea has been studied considerably in machine learning, referred to as kernel learning [3–11]. Note that the popular model selection approach in kernel machines, namely selecting kernel hyperparameters by cross validation, can be seen as a kernel learning, but the latter typically deals with more sophisticated approaches to kernel function learning. We briefly review two most successful kernel learning methods. Formally we have a parametric family of kernel functions, and we denote a kernel in the family by
The first approach is the so-called kernel target alignment (KTA) [3]. The KTA aims to maximize the alignment score between the kernel matrix (the similarity matrix on the input vectors) and the label affinity matrix (the similarity matrix on the labels). Here the alignment score between two matrices is defined as the cosine angle between their vector representations. More formally, denoting the ( by
Here 〈
The second approach directly minimizes the loss of the kernel classifier based on the underlying kernel. As described above, the optimal classifier has the form of the weighted kernel sum, and its regularized classification loss can be expressed as:
In (
While the previous kernel learning approaches have shown great success in various application areas, the main drawback is that they assume fully-supervised setups. Accurately and manually labeling data points is costly, and in turn it hinders incorporating a large number of training data. In this section we tackle the kernel learning problem in a semi-supervised setup where only a small fraction of the training data points are labeled. Formally the training data is partitioned into labeled and unlabeled sets, where
While the kernel function can be chosen from a specific functional form like the RBF kernel, it is more flexible to have a combined form of different types of kernel functions. We select and fix ) as a convex hull of the base kernels, namely
Often we denote the ( by
In what follows we propose two semi-supervised kernel learning algorithms. In both approaches we formulate optimization problems for the kernel parameters as well as the unknown labels
The kernel function determines the affinity (manifold) structure of the data clouds, and one can enforce the labels of the labeled data points to propagate to unlabeled points along the manifold. This idea of label propagation has been studied in machine learning, referred to as manifold regularization [12, 15]. Specifically, they minimize an objective function that penalizes the label mismatch between two data points weighted by their kernel value (i.e., similarity), forcing data points closer to each other to attain similar labels, and vice versa. With the ground-truth label constraints for the labeled data points, this has an effect of label propagation from the labeled to the unlabeled set along the affinity manifold.
We extend the idea of manifold regularization to simultaneous kernel learning and label imputation. Note that the manifold regularization is purely semi-supervised label imputation method that assumes the data kernel (affinity structure) is fixed and given. Our kernel learning algorithm based on label propagation can be formulated as follows:
The equality in the objective straightforwardly follows by introducing the graph Laplacian ℒ =
As we have both
As a second approach, we do kernel learning with the objective of minimizing the hinge loss for the SVM classifier built on the kernel. While the main formulation is motivated from the SVM dual optimum minimization in [4], however, unlike their supervised treatment, we deal with semi-supervised data by simultaneously optimizing the labels of the unlabeled data. Specifically, our SVM dual loss based semi-supervised kernel learning can be formulated as:
where
In this formulation, presuming that the labels are fully known (i.e.,
Unlike the semi-definite programming transformation in [4] with fully labeled data, having
In (
The objective in (
In this section we empirically demonstrate the effectiveness of the proposed semi-supervised kernel learning algorithms on several benchmark pattern classification datasets.
We first describe the kernel function hypothesis used through-out the experiments. As mentioned in Section 3 we consider a kernel function family as a convex hull of some fixed base kernels. Specifically the base kernels take the form of weighted combination of the RBF and linear kernels, namely
In (
We test the kernel learning algorithms on four binary classification datasets from the UCI machine learning repository [17]: Sonar, Vote, Wpbc, and Liver datasets. The descriptions of the datasets can be found in
We list below the competing kernel learning approaches with abbreviations.
To measure the performance of the kernel learning algorithms, once the kernel is learned, we train the SVM classifier for the labeled training data only with the learned kernel, and evaluate the test errors. We report the test errors averaged over 10 random trials in Table 2. As shown, the existing supervised kernel learning algorithms are inferior to ours due to their ignorance of the large number of unlabeled data points. On the other hand, both of our approaches perform equally better on all datasets and all labeled set proportions consistently. This signifies the importance of exploiting the unlabeled data in conjunction with a few labeled samples in kernel learning. We attain this by label propagation along the kernel affinity mani-fold in SSKL-LP, and simultaneous label imputation and kernel learning within the SVM dual loss minimization framework.
In this paper we have proposed novel semi-supervised kernel learning algorithms that exploit a few labeled examples in conjunction with a large amount of unlabeled data points. One of our approaches seeks for the most plausible missing label imputation as well as the kernel similarity structure along which the labels can be propagated with minimal cost. In our second approach, the SVM hinge loss is minimized simultaneously over the kernel function and the missing labels. These approaches are promising in that they exploit both labeled and unlabeled data in principled ways, and indeed the effectiveness is empirically demonstrated on several benchmark datasets with partially labeled learning setups. While we have suggested several approximate solution methods for the optimization problems to deal with difficult mixed integer programming, and they turn out to yield viable solutions, it remains as future work to devise more efficient and robust solution algorithms.
This study was supported by Seoul National University of Science & Technology.
No potential conflict of interest relevant to this article was reported.
Table 1. Statistics of the UCI datasets.
Dataset | Number of data points | Input dimension |
---|---|---|
Sonar | 208 | 60 |
Vote | 320 | 16 |
Wpbc | 180 | 33 |
Liver | 345 | 6 |
Table 2. Test errors (%) on the UCI datasets with three different labeled training set proportions.
Dataset | Method | Labeled set proportions | ||
---|---|---|---|---|
10% | 30% | 50% | ||
Sonar | KTA | 48.43 ± 4.42 | 43.95 ± 3.23 | 36.67 ± 3.94 |
SLM | 47.14 ± 3.83 | 43.10 ± 3.45 | 38.81 ± 3.82 | |
SSKL-LP | 43.05 ± 3.79 | 40.24 ± 3.80 | 33.10 ± 3.05 | |
SSKL-SDM | 42.14 ± 4.22 | 40.00 ± 3.81 | 30.48 ± 3.73 | |
Vote | KTA | 25.86 ± 3.99 | 22.66 ± 3.50 | 13.57 ± 4.55 |
SLM | 23.66 ± 3.21 | 20.76 ± 4.62 | 13.74 ± 3.46 | |
SSKL-LP | 19.86 ± 4.62 | 16.57 ± 3.55 | 11.89 ± 2.78 | |
SSKL-SDM | 18.29 ± 3.90 | 15.29 ± 3.69 | 10.97 ± 3.53 | |
Wpbc | KTA | 35.43 ± 4.60 | 31.87 ± 3.97 | 29.05 ± 3.13 |
SLM | 34.39 ± 3.94 | 31.45 ± 4.60 | 29.87 ± 4.79 | |
SSKL-LP | 31.82 ± 3.42 | 28.43 ± 3.57 | 26.06 ± 3.65 | |
SSKL-SDM | 31.39 ± 3.65 | 27.56 ± 3.13 | 25.82 ± 3.42 | |
Liver | KTA | 48.84 ± 4.38 | 46.80 ± 3.32 | 40.87 ± 3.59 |
SLM | 47.25 ± 3.67 | 45.30 ± 3.90 | 40.75 ± 3.73 | |
SSKL-LP | 42.03 ± 3.72 | 40.29 ± 4.72 | 38.87 ± 3.92 | |
SSKL-SDM | 42.88 ± 3.32 | 41.30 ± 3.67 | 38.72 ± 3.59 |
E-mail: mikim@seoultech.ac.kr
Int. J. Fuzzy Log. Intell. Syst. 2017; 17(1): 10-16
Published online March 31, 2017 https://doi.org/10.5391/IJFIS.2017.17.1.10
Copyright © The Korean Institute of Intelligent Systems.
Minyoung Kim
Department of Electronics & IT Media Engineering, Seoul National University of Science & Technology, Seoul, Korea
Correspondence to:Minyoung Kim (mikim@seoultech.ac.kr)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
The kernel function plays a central role in modern pattern classification for its ability to capture the inherent affinity structure of the underlying data manifold. While the kernel function can be chosen by human experts with domain knowledge, it is often more principled and promising to learn it directly from data. This idea of kernel learning has been studied considerably in machine learning and pattern recognition. However, most kernel learning algorithms assume fully supervised setups requiring expensive class label annotation for the training data. In this paper we consider kernel learning in the semi-supervised setup where only a fraction of data points need to be labeled. We propose two approaches: the first extends the idea of label propagation along the data similarity graph, in which we simultaneously learn the kernel and impute the labels of the unlabeled data. The second aims to minimize the dual loss in the support vector machines (SVM) classifier learning with respect to the kernel parameters and the missing labels. We provide reasonable and effective approximate solution methods for these optimization problems. These approaches exploit both labeled and unlabeled data in kernel leaning, where we empirically demonstrate the effectiveness on several benchmark datasets with partially labeled learning setups.
Keywords: Kernel learning, Semi-supervised learning, Pattern classification, Optimization
Pattern classification is a central topic in recent machine learning and pattern recognition, where the goal is to learn an accurate class prediction function given a finite amount of labeled training examples. The prediction function not only explains the training data faithfully, but also needs to be generalizable to unseen data points. The kernel machines have been believed to be one of the most successful pattern recognition approaches for their ability to incorporate potentially nonlinear affinity structures in data through the kernel functions. The support vector machines (SVM) [1] and the Gaussian processes [2] are two most popular algorithms/frameworks based on the kernel functions.
How to choose the kernel function is crucial to success in classification. If human experts or domain knowledge are available, one can exploit them to derive a good kernel function. However, resorting to human efforts is costly, and for certain applications the domain knowledge is inherently unavailable. The kernel learning is a principled solution in such situations where it directly learns the kernel function from data. The kernel learning has received significant attention in machine learning and pattern recognition communities, and there is a line of important previous work that has shown the effectiveness of the kernel learning in theory and practice [3–11].
However, most kernel learning methods typically aim to minimize the label prediction loss for a classifier built on the kernel function. In other words, most kernel learning algorithms assume supervised learning setups in which the data points are fully annotated with class labels. Labeling a large number of data points is expensive requiring human efforts, and it is even prohibited when the data size increases drastically. In this paper we deal with kernel learning in partially labeled data setups where only a small portion of data points require to be labeled, and the rest unlabeled data are also exploited to learn a more accurate kernel. That is, we deal with the kernel learning under semi-supervised learning scenarios.
Specifically we propose two approaches to tackle the problem. The first approach extends the idea of the manifold regularization [12] that aims to impute the missing labels using the label propagation along the data similarity manifold formed by the underlying kernel function. Whilst the kernel is assumed given and fixed in this previous work, we extend it for kernel learning formulation. The second method aims to minimize the dual optimum of the SVM classifier where we pose a mini-max optimization that effectively finds the best kernel that yields the smallest SVM hinge loss. In both formulations, we simultaneously optimize the kernel parameters and the missing labels as optimization variables. Although these proposed optimization problems are difficult to solve, we introduce efficient approximate solution methods that can find viable solutions.
The rest of the paper is organized as follows. After briefly reviewing some background on kernel machines and kernel learning, we describe our kernel learning approaches in semi-supervised data setups in Section 3. Empirical evaluations on some benchmark datasets in Section 4 demonstrate the effectiveness of the proposed algorithms, significantly outperforming existing methods that exploit the labeled data only.
We begin with a brief review of kernel machines for pattern classification problems in machine learning. In the standard binary fully-supervised learning setup, one is given labeled training data points
A typical approach is to consider a linear function family
In (
It can be shown that (e.g., the representer theorem [14]) the optimal
The kernel function, as it is the inner product in the feature space, naturally captures similarity between input data points. In kernel machines one can represent arbitrary non-linear affinity structures in data manifold through the kernel functions, without the effort of learning the complex input feature mapping explicitly. This is the major benefit of the kernel machines including SVM and Gaussian processes. For computational simplicity, one typically opts for certain parametric kernel forms. The radial basis function (RBF) kernel, defined as
Whereas the kernel function or the data affinity structure can be selected from human experts, it is expensive, and often times the domain information is not available at all. A more reasonable treatment is to form a parametric family of the kernel functions, and learn the kernel parameters from data. This idea has been studied considerably in machine learning, referred to as kernel learning [3–11]. Note that the popular model selection approach in kernel machines, namely selecting kernel hyperparameters by cross validation, can be seen as a kernel learning, but the latter typically deals with more sophisticated approaches to kernel function learning. We briefly review two most successful kernel learning methods. Formally we have a parametric family of kernel functions, and we denote a kernel in the family by
The first approach is the so-called kernel target alignment (KTA) [3]. The KTA aims to maximize the alignment score between the kernel matrix (the similarity matrix on the input vectors) and the label affinity matrix (the similarity matrix on the labels). Here the alignment score between two matrices is defined as the cosine angle between their vector representations. More formally, denoting the ( by
Here 〈
The second approach directly minimizes the loss of the kernel classifier based on the underlying kernel. As described above, the optimal classifier has the form of the weighted kernel sum, and its regularized classification loss can be expressed as:
In (
While the previous kernel learning approaches have shown great success in various application areas, the main drawback is that they assume fully-supervised setups. Accurately and manually labeling data points is costly, and in turn it hinders incorporating a large number of training data. In this section we tackle the kernel learning problem in a semi-supervised setup where only a small fraction of the training data points are labeled. Formally the training data is partitioned into labeled and unlabeled sets, where
While the kernel function can be chosen from a specific functional form like the RBF kernel, it is more flexible to have a combined form of different types of kernel functions. We select and fix ) as a convex hull of the base kernels, namely
Often we denote the ( by
In what follows we propose two semi-supervised kernel learning algorithms. In both approaches we formulate optimization problems for the kernel parameters as well as the unknown labels
The kernel function determines the affinity (manifold) structure of the data clouds, and one can enforce the labels of the labeled data points to propagate to unlabeled points along the manifold. This idea of label propagation has been studied in machine learning, referred to as manifold regularization [12, 15]. Specifically, they minimize an objective function that penalizes the label mismatch between two data points weighted by their kernel value (i.e., similarity), forcing data points closer to each other to attain similar labels, and vice versa. With the ground-truth label constraints for the labeled data points, this has an effect of label propagation from the labeled to the unlabeled set along the affinity manifold.
We extend the idea of manifold regularization to simultaneous kernel learning and label imputation. Note that the manifold regularization is purely semi-supervised label imputation method that assumes the data kernel (affinity structure) is fixed and given. Our kernel learning algorithm based on label propagation can be formulated as follows:
The equality in the objective straightforwardly follows by introducing the graph Laplacian ℒ =
As we have both
As a second approach, we do kernel learning with the objective of minimizing the hinge loss for the SVM classifier built on the kernel. While the main formulation is motivated from the SVM dual optimum minimization in [4], however, unlike their supervised treatment, we deal with semi-supervised data by simultaneously optimizing the labels of the unlabeled data. Specifically, our SVM dual loss based semi-supervised kernel learning can be formulated as:
where
In this formulation, presuming that the labels are fully known (i.e.,
Unlike the semi-definite programming transformation in [4] with fully labeled data, having
In (
The objective in (
In this section we empirically demonstrate the effectiveness of the proposed semi-supervised kernel learning algorithms on several benchmark pattern classification datasets.
We first describe the kernel function hypothesis used through-out the experiments. As mentioned in Section 3 we consider a kernel function family as a convex hull of some fixed base kernels. Specifically the base kernels take the form of weighted combination of the RBF and linear kernels, namely
In (
We test the kernel learning algorithms on four binary classification datasets from the UCI machine learning repository [17]: Sonar, Vote, Wpbc, and Liver datasets. The descriptions of the datasets can be found in
We list below the competing kernel learning approaches with abbreviations.
To measure the performance of the kernel learning algorithms, once the kernel is learned, we train the SVM classifier for the labeled training data only with the learned kernel, and evaluate the test errors. We report the test errors averaged over 10 random trials in Table 2. As shown, the existing supervised kernel learning algorithms are inferior to ours due to their ignorance of the large number of unlabeled data points. On the other hand, both of our approaches perform equally better on all datasets and all labeled set proportions consistently. This signifies the importance of exploiting the unlabeled data in conjunction with a few labeled samples in kernel learning. We attain this by label propagation along the kernel affinity mani-fold in SSKL-LP, and simultaneous label imputation and kernel learning within the SVM dual loss minimization framework.
In this paper we have proposed novel semi-supervised kernel learning algorithms that exploit a few labeled examples in conjunction with a large amount of unlabeled data points. One of our approaches seeks for the most plausible missing label imputation as well as the kernel similarity structure along which the labels can be propagated with minimal cost. In our second approach, the SVM hinge loss is minimized simultaneously over the kernel function and the missing labels. These approaches are promising in that they exploit both labeled and unlabeled data in principled ways, and indeed the effectiveness is empirically demonstrated on several benchmark datasets with partially labeled learning setups. While we have suggested several approximate solution methods for the optimization problems to deal with difficult mixed integer programming, and they turn out to yield viable solutions, it remains as future work to devise more efficient and robust solution algorithms.
This study was supported by Seoul National University of Science & Technology.
Table 1 . Statistics of the UCI datasets.
Dataset | Number of data points | Input dimension |
---|---|---|
Sonar | 208 | 60 |
Vote | 320 | 16 |
Wpbc | 180 | 33 |
Liver | 345 | 6 |
Table 2 . Test errors (%) on the UCI datasets with three different labeled training set proportions.
Dataset | Method | Labeled set proportions | ||
---|---|---|---|---|
10% | 30% | 50% | ||
Sonar | KTA | 48.43 ± 4.42 | 43.95 ± 3.23 | 36.67 ± 3.94 |
SLM | 47.14 ± 3.83 | 43.10 ± 3.45 | 38.81 ± 3.82 | |
SSKL-LP | 43.05 ± 3.79 | 40.24 ± 3.80 | 33.10 ± 3.05 | |
SSKL-SDM | 42.14 ± 4.22 | 40.00 ± 3.81 | 30.48 ± 3.73 | |
Vote | KTA | 25.86 ± 3.99 | 22.66 ± 3.50 | 13.57 ± 4.55 |
SLM | 23.66 ± 3.21 | 20.76 ± 4.62 | 13.74 ± 3.46 | |
SSKL-LP | 19.86 ± 4.62 | 16.57 ± 3.55 | 11.89 ± 2.78 | |
SSKL-SDM | 18.29 ± 3.90 | 15.29 ± 3.69 | 10.97 ± 3.53 | |
Wpbc | KTA | 35.43 ± 4.60 | 31.87 ± 3.97 | 29.05 ± 3.13 |
SLM | 34.39 ± 3.94 | 31.45 ± 4.60 | 29.87 ± 4.79 | |
SSKL-LP | 31.82 ± 3.42 | 28.43 ± 3.57 | 26.06 ± 3.65 | |
SSKL-SDM | 31.39 ± 3.65 | 27.56 ± 3.13 | 25.82 ± 3.42 | |
Liver | KTA | 48.84 ± 4.38 | 46.80 ± 3.32 | 40.87 ± 3.59 |
SLM | 47.25 ± 3.67 | 45.30 ± 3.90 | 40.75 ± 3.73 | |
SSKL-LP | 42.03 ± 3.72 | 40.29 ± 4.72 | 38.87 ± 3.92 | |
SSKL-SDM | 42.88 ± 3.32 | 41.30 ± 3.67 | 38.72 ± 3.59 |
Seoung-Ho Choi
International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(4): 317-332 https://doi.org/10.5391/IJFIS.2023.24.4.317Le Van Hoa and Vo Viet Minh Nhat
International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(3): 181-193 https://doi.org/10.5391/IJFIS.2024.24.3.181Archana P. Kale
International Journal of Fuzzy Logic and Intelligent Systems 2023; 23(4): 465-481 https://doi.org/10.5391/IJFIS.2023.23.4.465