Article Search
닫기

Original Article

Split Viewer

International Journal of Fuzzy Logic and Intelligent Systems 2022; 22(2): 169-182

Published online June 25, 2022

https://doi.org/10.5391/IJFIS.2022.22.2.169

© The Korean Institute of Intelligent Systems

Using Fuzzy Time Series Models to Estimate the Cost of Benefits of Egyptian Social Insurance System

Reham Raouf and Saad Elsaieed

Department of Insurance & Actuarial Sciences, Faculty of Commerce, Cairo University, Cairo, Egypt

Correspondence to :
Reham Raouf (rehamraouf@foc.cu.edu.eg)

Received: September 18, 2021; Revised: March 31, 2022; Accepted: April 22, 2022

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

In recent years, many methods have been proposed to forecast data in different fields based on successful fuzzy time series models (FTS). Egyptian social insurance systems (SISs) need support to optimally define and estimate yearly total benefits (pensions), which helps the actuaries who are responsible for the system make optimal decisions. Given that the total benefits have not been forecasted before by prediction methods, this paper proposes FTS models by Chen, Cheng, Yu, and Song to forecast Egyptian social insurance benefits, proposes Huarng for appropriate partition lengths, and constructs the interval length using the difference in the transformation data method, given that the data has not been stationary in recent years and has increased significantly. The proposed approach is based on experiments implemented using four models with interval length partitions of 5, 10, 50, 100, and Huarng partitions of 465. The results show great progress in the performance of yearly benefit forecasting, especially in the Chen model with a Huarng 465 partition, which has high accuracy prediction with low error when training and testing data.

Keywords: Fuzzy time series, Social insurance, Benefits, Pensions

Forecasting activities play a significant part in our everyday lives; they are critical in a variety of businesses because projections of future occurrences must be factored into decision-making. Given that social insurance is one of the most significant concerns that most nations confront because of the sensitivity of its relationship with public budget impacts, forecasting insurance is an important issue, and better forecasting aids actuaries and the responsibility of making strategic choices in the social insurance system. In this context, this study proposes a prediction of the total benefits in millions (pensions) required by Egyptian insurance systems.

1.1 Social Insurance Systems

According to their conventional definition, pension schemes and social insurance systems (SISs) are one of the three primary components of social protection, together with social assistance, social safety nets, employment strategies, and labor market interventions [1]. SISs are a mechanism that allows workers to engage in securing their future, and they are partially or entirely supported by worker and employee contributions. Egypt has a broadly stratified SIS operating as a fully funded plan, where employees make payments that are invested and subsequently reimbursed as pensions. The system has gradually evolved with a defined benefit plan. The system is primarily governed by several laws, and the present legal framework, Law 148/2019, was formally established on the first of January 2020. This, in turn, plays a significant role in changing the system and overcoming challenges due to the complexity of the old social insurance regulatory and legal framework. In practice, the system faces serious challenges and is extremely fragmented with several benefit packages available to various parts of the workforce, offering payment types for employees and their families categorized by old age, disability, survivors, illness, work injury, and unemployment [2,3]. In addition, several factors have led to Egypt’s benefits being unsustainable and inefficient. For example, the reserves of pension funds are invested at low, occasionally negative, real interest rates. They provide extremely generous minimal pensions and opportunities for early retirement. Members may also easily alter the size of their pensions, which are based solely on payments made in the few years leading to retirement. Many workers agree with their employers to under-report their income throughout the majority of their working careers to reduce their payments. As a result of these factors, Egypt’s pension funds will soon spend more on pension payouts than is received via contributions from members [4]. Therefore, it should be noted that the rate of total Egypt’s benefits (pensions) in recent years is increasing sharply, forcing us to forecast the total benefits in the next few years. Annual benefits are estimated based on actuarial models and used to provide demographic and financial forecasts for pension systems. They are often generated from models applied to vocational pension schemes that cover groups of employees and are based on demographic and economic factors. Consequently, actuarial methods in connection with predictive analytics must significantly enhance their understanding of predicted behavior or events to support their policies and judgments [5]. In addition, implementing a year-by-year simulation approach to estimate the future costs of benefits is one of the most significant tasks for actuaries [6]. Therefore, benefit prediction and analysis will aid actuaries in their professional responsibility of making strategic choices in the social insurance system. To the best of our knowledge, our proposed approach is the first to predict total benefits regarding Egyptian social insurance using fuzzy time series (FTS) models with a yearly dataset (pensions); this approach leads to sufficient and outperformed results. In this study, we present the use of FTS models to forecast Egyptian social insurance benefits (pensions) rather than traditional assumption methods for better forecasting pension demand.

1.2 Fuzzy Time Series Overview

Fuzzy set theory was developed to address the ambiguity and uncertainty that characterize most real-world problems. The highly popular fuzzy set theory, proposed by Zadeh [7] in 1965, is used to explain linguistic fuzzy information through mathematical modeling and generates various findings. It is still extensively utilized in a wide range of applications. Subsequently, a number of academics proposed several models, the first of which was a FTS forecasting model. Song and Chissom [8] developed books explaining the concepts of the fuzzy set theory to overcome the challenges of classical time series. However, this model incorporates the max-min composition method, which makes the computing process substantially more difficult [9]. The Chen model also contains a series of studies on the creation and execution of experimental University of Alabama enrollment data forecasting. Chen [10] proposed a simpler calculation technique to address this flaw with the benefit of reducing computing time and making the procedure more understandable. However, for fuzzy logical relationships (FLRs), this model lacks an appropriate weight mechanism. The model has now been widely modified, with academics attempting to increase the prediction accuracy by modifying the weighting technique or increasing the duration of linguistic intervals. Hwang et al. [11] proposed a forecasting approach based on a FTS. Chen and Hwang [12] proposed a FTS-based temperature forecasting algorithm. Chen [10] improved forecasting by using a high-order FTS model. The expanded Chen’s model was used in [1], and Huarng in [13] predicted enrollments using heuristic principles, which reduced computations. Huang [14] calculated interval lengths using both average-based lengths and distribution bases. Fuzzy theory was further refined and suited by Yu and his colleague [15,16] to address weighing and recurring issues using typical FTS models, including stock price predictions. To improve forecasting accuracy, Yu [15] used multiple weighted technologies on FTS models. Then, the ratio-based interval length was added to the FTS model proposed by Huarng and Yu [17]. Cheng [18] used a FTS model combined with a trend-weighting method to forecast real stock price trading data and university enrolment.

This paper is organized as follows. In Section 2, we review related studies that implemented FTS in different domains with various datasets to extract model features. Section 3 presents the steps of the FTS models, and Section 4 illustrates the implementation of the proposed FTS model and the forming benefits dataset. In Section 5, the results and discussion of the model’s approach and accuracy in existing work are presented. Finally, Section 6 summarizes and concludes the study.

Machine-learning models have contribute to SIS. The author in [19] proposed the use of three algorithms, namely the decision tree, native Bayes, and CN2 rule induction, to predict classification based on some of the social insurance features for people. The three algorithms obtained high accuracy results when classifying. The author in [20] proposed a novel deep learning-based framework based on the recurrent neural network (RNN) architecture for forecasting an individual’s payment status using a real dataset gathered by Taiwan’s Ministry of Health and Welfare. The model is being developed to effectively abstract people’s payment behavior and accurately forecast their future payment behavior over a lengthy period. Compared with state-of-the-art approaches, such as support vector machines (SVM) and hidden Markov models (HMM), the model outperforms even under varied settings. However, these studies are related to the prediction of individuals’ personal payments, which is an issue that differs from our target.

Social insurance benefits (pensions) still make traditional assumptions. In this section, we focus on related studies that implemented FTS in different domains using various datasets. There are studies related to FTS models in different industries and domains that predict enrollment, stock marketing, tourism, shipping, and transportation. We led the forced implementation of FTS to predict Egypt’s social insurance in a series of years, including the total amounts of pensions.

Although the dataset related to our proposal is represented in a series of years, the critical data must be predicted using the FTS model. The available FTS models implemented for prediction in the studies are as follows. The author in [21] proposed two models to predict the demand for tourism arrival in Taiwan as well as arrivals during the period of 1989–2000 from Hong Kong, Germany, and the United States. The FTS is more appropriate for predicting tourism demand, especially from Hong Kong arrival to Taiwan with an error of 1.93, when compared to the greedy algorithm. The author in [22] measured the prediction performance for tourist arrivals in Indonesia using a FTS compared with classical methods such as seasonal autoregressive integrated moving average (SARIMA), Box–Jenkins methods, time series regression, and Holt-Winters. The dataset was divided into training datasets from January 1989 to December 1996. The testing data were collected from January 1997 to December 1997. The root mean square error (RMSE), mean square error (MSE), mean absolute deviation (MAD), and mean absolute percentage error (MAPE) were measured. The accuracy of the resultant Chen’s FTS was outperformed by classical methods after used data transformation. Although FTS methods are simple, they outperformed all the statistical classical methods, which were used previously to predict social insurance; this prompted our proposal for using FTS methods for predictions. In [23], power companies used the proposed FTS model for electrical power to predict electrical load to make decisions. The dataset of the regional electric load in Taiwan, represented yearly, was used in this study and a comparison between FTS and previously implemented models’ linear and nonlinear regression was conducted. The accuracy of FTS outperforming in the MAPE is smaller than in other implemented models, especially when given appropriate intervals using an efficient length of discourse universe. This study adds attention to interval length, which is very important to the experimental results in the current proposal as follows. In this study [24], FTS models were analyzed for prediction as the author’s objective, using monthly data of three agro-products that represent available time-series data. Chen, Huarng, and Singh critically tested the FTS model’s production time-series data for 22 years, from 1988 to 2010. the MSE in predicting sugar production the (Chen) had the best acceptable accuracy of 63.45, whereas in the prediction of Lahi and Rice production (Singh), the best accuracies were 2605 and 913.62, respectively. The possible advantages of this study are that the varied FTS model’s predictions probably differ according to the value of the dataset, which leads to various FTS model experiments in our approach. The author in [25] enhanced FTS by reducing the prediction error and proposed a new prediction model set on the fuzzy transform (F-transform), which is established based on partitioning the universe, and the fuzzy logical relationships are utilized for prediction. This study showed that it can improve prediction accuracy by implementing the F-transform with applicable models in enrollments of the University of Alabama and many patents awarded in Taiwan, which could be the starting point in our proposed model taking dataset transformation into consideration.

FTS has multiple models implemented in different domains and varies discrimination according to the model, data set values, partition method, accuracy measure tools, and so on. This forced us to implement various experiences in our proposed model to make predictions that outperform by observing the results.

This paper’s proposed FTS models, Chen, Cheng, Yu, and Song, forecast Egyptian social insurance benefits, construct the interval length using methods of difference transformation data, use Huarng for appropriate partition lengths, and conduct experiments of interval length partitions (5, 10, 50, and 100). In light of the proposed study, the definition of FTS proposed by Song and Chissom [8,9,26] is based on fuzzy sets [7]. Chen [10] developed a novel approach that is more efficient than Song and Chissom’s proposed method because it employs simpler arithmetic operations rather than the difficult max-min composition operation. The steps of the Chen method are briefly reviewed in the following steps and depicted as a flowchart in Figure 1.

Step 1: Dataset

This step collects historical dataset and processing

Step 2: Divide the universe of discourse to partitions

(1) Defining and setting the universe of discourse (U) according to the available historical data time series using the following formula:

U=[Dmin-D1,Dmax+D2],

where Dmin and Dmax are the lowest and maximum number of units, respectively. D1 and D2 are positive integer numbers that are utilized to calculate and vary the length on both sides of U in order to preserve a variation space and ensure that future outcomes are contained within U.

(2) Dividing the universe of discourse U into multiple partitions of equal interval lengths. To determine the suitable length, Huarng [28] proposed an applied computed length of interval L according to the range in Table 1 as follows:

a) For the first differences, compute all the absolute differences between the values Dh – 1 and Dh, and then calculate the average of the first differences.

b) Assume that the length is half the average.

c) Using Table 1, we identified the length range and calculated the base length.

d) The length of the required L is rounded using the provided base. The number of intervals m were then calculated as follows:

m=Dmax+D2-Dmin-D1L.

Then, U can be divided into equal-length intervals U = [u1, u2, ..., un]. Assume that m intervals are equal. u1 = [d1, d2], u2 = [d2, d3], ..., um = [dm, dm+1].

Step 3: Fuzzy set

Define the fuzzy set as A(i) a collection of items having a continuous membership grade by using the universe of discourse (U) as {u1, u2, ..., un} defined by

Ai=fA1(u1)/u1+fA2(u2)/u2++fAn(un)/un,

where fAi values refer to the membership indicating that the grade of the fuzzy set has the value 0 ≤ fAi(un) ≤ 1. Determining the degree to which each value belongs to each Ai (i = 1, 2, ..., n), the time (t) considering the fuzzified time series is considered as Ai, where the maximum membership degree occurs at some point (t) by

A1=1/u1+0.5/u2+0/u3+0/u4++0/un,A2=0.5/u1+1/u2+0.5/u3+0/u4++0/un,A3=1/u1+0.5/u2+1/u3+0.5/u4++0/un,      An=0/u1+0/u2++0.5/um-2+1/um-1+0.5/un.

Step 4: Fuzzification and fuzzy logical relationship

Fuzzy variances depend on the value of the membership degree. During this step, a fuzzy set was created for each set of data. The degree to which each historical data point correlates to each Ai may be estimated if the fluctuation is within ui. In this phase, the set of fuzzy logical relationships is between the Ai(t – 1) and Aj(t) linguistic values, which are represented as AiAj for all fuzzy logical relationships. For example, in cases A1A1, A1A2, and A1A3, the fuzzy logical relationship groups are categorized and transformed into A1A1, A2, and A3 groups.

Step 5: Defuzzification and forecast value

The defuzzification principles for the fuzzified forecasted variants in Step 4 are as follows:

  • (1) If the membership of the output has just one maximum of (Ui), choose the center of the interval that corresponds to the maximum predicted value.

  • (2) If the membership of the output consists of one or more sequential maxima, use the midpoint of the relevant conjunct period as the prediction.

  • (3) If the membership of the output is 0, there is no limitation, and no change is anticipated compared to 0.

Step 6: Evaluate and performance the results

Adding the actual value of the change from the previous year to the expected degree of change yields the predicted value for the current year. The robustness of a model can be measured using statistical tools.

4.1 Dataset Description

The dataset was collected from the National Fund for Social Security and the Ministry of Insurance and Social Affairs (MOISA). All insurance data cover 43 years from 1976 to 2019; each year refers to the total benefits contained (government sector, public, and private sector) that are required from the insurance system to cover. The data are divided into training and testing datasets to evaluate the performance and compare it to other models to validate the proposed FTS model. Figure 2 shows plots of the total benefits (in millions), training data, and testing data. The dataset was divided into 80% training data from (1976 to 2009) and 20% testing data from (2010 to 2019). Table 2 shows the total benefits in one million datasets in detail.

4.2 Fitting Proposed Models with Partition

Given that the benefit values have been successively increasing over the years, the transformation method is the most suitable method for use with the present data and it is advantageous for making data stationary [27] where the partitions are based on the differences and not on the original values. Figure 3 shows the yearly differences during the 44 years, and Table 2 shows the different column differences. Then, we built the partition set based on these values.

4.3 Models Training and Testing

The suggested (Chen, Cheng, Song, Yu) model technique for utilizing insurance pension data was implemented with different partitions using the PYFTS library [28]. Figure 4 shows the group of model training and testing prediction data in the curve for all experimental models and partitions. Figure 4 contains the four models, representing Chen, Cheng, Song, and Yu, respectively.

4.4 Chen Model Implementation

Given the multitude of experimentally implemented models and partitions, we briefly explain the model’s implementation with different forms in Python FTS libraries as follows. However, we did not mention the implementation steps in all experiments to avoid redundancy in the explanation. We have already displayed all the results of the experiments of the different models with partitions and analysis in the next section. To avoid repetition, we displayed the implementation steps for one instance of the experimental results of the Chen model with partition 5 in detail.

Step 1: The historical dataset time series is collected to define and set the universe of discourse (U) based on different transformations each year, which is explained in the dataset subsection.

Step 2: Partition grid transformation universe of discourse

In the partition step, we propose using the main method grid partitions proposed in [810,29,30] with a varying number of partitions to divide the universe of discourse (U) over four models where the length of each partition is the same.

The experimental implementation used partitions 5, 10, 50, 100, and 465 (based Huarng) partitions. It establishes the best partition number for improving model accuracy [3133]. Figure 5 shows the group of partition interval lengths of the training datasets representing partitions 5, 10, 50, 100, and 465 Huarng, respectively. For example, the partition 5 triangle grids divide the universe of discourse in detail as follows:

A0:trimf([-953.9,-22.0,909.9]),A1:trimf([-22.0,909.9,1841.8),A2:trimf([909.9,1841.8,2773.7]),A3:trimf([1841.8,2773.7,3705.6]),A4:trimf([2773.7,3705.6,4637.6])

Step 3: Define the fuzzy set membership and fuzzily the variations based on the membership

In this step, we classify each value as a benefit to define belonging as fuzzified. Table 3 lists each benefit fuzzified in the fuzzified column.

Step 4: Chen fuzzy logical relationship rules

The Chen model establishes fuzzy logical relationship (FLR) rule groups based on partition 5, the fuzzy logical relationship is as follows: [‘A0A0’, ‘A0A1’, ‘A1A1’, ‘A1A2’, ‘A2A2’, ‘A2A3’, ‘A3A2’, ‘A3A4’, ‘A4A2’, ‘A2A4’].

The derived FLRs are then divided into groups based on the current benefits of FLRs. Consequently, five FLR groups were obtained, as shown in Figure 6. Table 2 illustrates each benefit with its FLR group column as follows:

FLRgroupsA0A0,A1A1A1,A2A4A2A2A2,A3,A4A3A2,A4

Step 5: Forecast value. See Table 3.

Step 6: Evaluate the performance results, which are explained in detail in the next section.

5.1 Measure Tools Performance and Evaluation

The five FTS models proposed that the predicted benefits evaluated using statistical measure tools were MAPE, RMSE, SMAPE, and Thiel’s U statistic the measures were chosen as the model error assessment metrics, which can assess the degree of change and accuracy of data while assessing the model’s prediction quality. The tool are defined as the RMSE in Eq. (4), MAPE in Eq. (5), MAPE (SMAPE) in Eq. (6), and Theil’s U in Eq. (7) where yn refers to the actual data, fn refers to the forecast data, and N refers to the total number of datasets.

RMSE=n=1N(yn-fn)2N,MAPE=1NnN|yn-fny^n|×100,SMAPE=1Nn=1Nyn-fn(yn+fn/2,Theils U=n=1N(yn-fn)2n=1Nyn2+n=1Nfm2.

5.2 Models Results and Accuracy

In the training dataset, accuracy was measured using four tools to show the error value in each model with each experimental partition used. Figure 7 shows the training accuracies of the models containing the four models representing the RMSE, MAPE, SMAPE, and Thiel’s coefficient, respectively. The four tools had the same accuracy rates for the models. It should be noted that the three models Chen, Cheng, and Yu start with approximate errors in partition 5 and start decreasing to have the lowest error in partition 465. Conversely, the Song model had low errors in partition 5, and the errors increased when the partitioning was increased. The three models, Chen, Cheng, and Yu, feature stability and have low errors starting at partition 50 to partition 100, and in partition 465 (based Huarng) get the same lowest error among the four tools. Table 4 shows the training accuracy, RMSE error in the range of 17.89 to 18.99, error in MAPE in the range of 0.36 to 0.37, error in SMAPE is 0.18, and error in Thiel’s U is 0. The three models, Chen, Cheng, and Yu, had identical accuracies. In contrast, in the Song model in partition 465 (based Huarng), the error in RMSE is 1540.18, the error in MAPE is 122.32, the error in SMAPE is 23.8, and the error in Theil’s U is 0.06. The Song model had extreme errors compared with the three models. Even in partition 5, the Song model had slightly more errors than the three models Chen, Cheng, and Yu. Overall, the Chen, Cheng, and Yu models have an optimal prediction in the training dataset with slight errors, particularly with partition 465.

Testing accuracy is an important factor that represents the power required to forecast the data. In the testing dataset, the accuracy was measured using four tools to show the error value in each model with each experimental partition used. Figure 8 shows the model value testing accuracies, representing the four tools RMSE, MAPE, SMAPE, and Thiel’s coefficient, respectively. It should be noted that the four tools have the same rates of accuracy for the models, which is similar to the training accuracy. Overall, the four models have approximate error values in partition 5, but the three models Chen, Cheng, and Yu decreased the errors from partition 10 and remained stationary with low errors to partitions of 465. The Song model remained stationary with high errors, even in partitions of 465. Table 5 shows the testing dataset accuracy, where the accuracies of the four models in partition 5 are too close together. The error in RMSE in range 11306 to 11314, the error in MAPE in range 10.95 to 11.98, the error in SMAPE in range 5.89 to 6.03, and the error in Thiel’s Coefficient in range 0.069 to 0.059. Then, the three models, Chen, Cheng, and Yu, have error estimates that are the same as completely after partitions 5. In parallel, the Song model had the highest error for each partition, even with slight errors. However, the ideal result was in partition 465; the three models had higher accuracy with low errors identical and in four measure tools RMSE, MAPE, SMAPE, and Thiel’s U, 11411.13, 8.87, 4.70, and 0.059, respectively. Song were 12927.04, 10.62, 5.68, and 0.067, respectively.

This paper proposed to predict yearly Egyptian total benefits (pensions) using FTS models Chen, Cheng, Song, and Yu, construct the interval length using methods of difference transformation data method, given the data has not been stationary in the last years and increasing significantly, and Huarng method for appropriate partitions length, in addition, experiments on interval length partitions 5, 10, 50, 100 for further validation of the models. Considering that we have many experiments, to avoid repetition, we sufficed to explain the Chen model implemented with partition 5 in detail. The performance of the suggested models was assessed by predicting the yearly benefits using four statistical criteria. According to the data, the proposed techniques significantly enhance the forecast performance and prediction accuracy. In particular, the Chen, Cheng, and Yu models proved to outperform, especially in training accuracy at partition 465 based Huarng, and predict with acceptable accuracy in testing data, whereas the Song model has the highest error in training and testing. The proposed model motivates the implementation of FTS to predict (pensions) side-by-side with recent learning-based algorithms in different SISs in other countries.

Fig. 1.

FTS model steps.


Fig. 2.

Plots of total benefits (in millions).


Fig. 3.

Yearly transformation difference.


Fig. 4.

Training and testing prediction.


Fig. 5.

Partitions of interval lengths (training dataset): (a) partition 5, (b) partition 10, (c) partition 50, (d) partition 100, and (e) Huarng 465.


Fig. 6.

Network FLR rules.


Fig. 7.

Training accuracy of the four models: (a) RMSE, (b) MAPE, (c) SMAPE, and (d) Thiel’s coefficient.


Fig. 8.

Testing accuracy of the four models: (a) RMSE, (b) MAPE, (c) SMAPE, and (d) Thiel’s coefficient.


Table. 1.

Table 1. Range of interval.

RangeBase
0.1–1.00.1
1.1–101
11–10010
101–1000100

Table. 2.

Table 2. Total benefits dataset.

YearGovernment sectorPublic & private sectorTotal benefits in millionsDifferenceYearGovernment sectorPublic & private sectorTotal benefits in millionsDifference
19761215717901977145401856
1978162512142919791956826450
1980243105348841981207121328−20
19824013517534251983436424861108
1984557485104218119856415531194152
1986779631141021619878416881529119
198810138231837308198912759692244407
19901417113325513071991171413863100549
19922037170537436431993235320844438695
1994301925625582114419953587300065871005
199641073469757799019974810409089011324
199852525003102551354199960345845118801625
200066896797134871607200177427563153051818
2002959083141790425992003107918996197871818
200412240975621996181820051398810590245782599
2006154411286528306372820071686713398302651959
2008193111517034481421620091948818139376273146
2010226601845641116348920112872422124508489732
20123556828164637321288420133364035777694175685
2014405584317583733143162015485545181610037016637
20165549559169114664142942017623006760012990015236
20187280077600150400205002019857009280017850028100

Table. 3.

Table 3. Dataset training fuzzyfied and FLR group rules.

YearBenefitsDifferenceFuzzyfiedFLR GroupDefuzzificationForecasting value
19761790A0A0,A1——0
19771856A0A0,A1[(−22.0) + (909.9)]/2 + 179622.96
197821429A0A0,A1[(−22.0) + (909.9)]/2 + 185628.96
197926450A0A0,A1[(−22.0) + (909.9)]/2 + 214657.96
198034884A0A0,A1[(−22.0) + (909.9)]/2 + 264707.96
1981328−20A0A0,A1[(−22.0) + (909.9)]/2 + 348791.96
1982753425A0A0,A1[(−22.0) + (909.9)]/2 + 328771.96
1983861108A0A0,A1[(−22.0) + (909.9)]/2 + 7531196.96
19841042181A0A0,A1[(−22.0) + (909.9)]/2 + 8611304.96
19851194152A0A0,A1[(−22.0) + (909.9)]/2 + 10421485.96
19861410216A0A0,A1[(−22.0) + (909.9)]/2 + 11941637.96
19871529119A0A0,A1[(−22.0) + (909.9)]/2 + 14101853.96
19881837308A0A0,A1[(−22.0) + (909.9)]/2 + 15291972.96
19892244407A0A0,A1[(−22.0) + (909.9)]/2 + 18372280.96
19902551307A0A0,A1[(−22.0) + (909.9)]/2 + 22442687.96
19913100549A1A1,A2[(−22.0) + (909.9)]/2 + 25512994.96
19923743643A1A1,A2[(909.9) + (1841.8)]/2 + 31004475.88
19934438695A1A1,A2[(909.9) + (1841.8)]/2 + 37435118.88
199455821144A1A1,A2[(909.9) + (1841.8)]/2 + 44385813.88
199565871005A1A1,A2[(909.9) + (1841.8)]/2 + 55826957.88
19967577990A1A1,A2[(909.9) + (1841.8)]/2 + 65877962.88
199789011324A1A1,A2[(909.9) + (1841.8)]/2 + 75778952.88
1998102551354A1A1,A2[(909.9) + (1841.8)]/2 + 890110276.88
1999118801625A2A2, A3, A4[(909.9) + (1841.8)]/2 + 1025511630.88
2000134871607A2A2, A3, A4[(1841.8) + (2773.7) + (3705.6)]/3 + 1188014653.76
2001153051818A2A2, A3, A4[(1841.8) + (2773.7) + (3705.6)]/3 + 1348716260.76
2002179042599A3A2, A4[(1841.8) + (2773.7) + (3705.6)]/3 + 1530518078.76
2001153051818A2A2, A3, A4[(1841.8) + (3705.6)]/2 + 1790420677.76
2001153051818A2A2, A3, A4[(1841.8) + (2773.7) + (3705.6)]/3 + 1530522560.76
2002179042599A3A2, A4[(1841.8) + (2773.7) + (3705.6)]/3 + 1530524769.76
2006283063728A4A2[(1841.8) + (3705.6)]/2 + 1790427351.76
2007302651959A2A2, A3, A41841.8 + 2830630147.84
2008344814216A4A2[(1841.8) + (2773.7) + (3705.6)]/3 + 3026533038.76
2009376273146A4A21841.8 + 3448136322.84
2010411163489A4A21841.8 + 3762740400.76
2011508489732A4A21841.8 + 4111642957.84
20126373212884A4A21841.8 + 5084852689.84
2013694175685A4A21841.8 + 6373265573.84
20148373314316A4A21841.8 + 6941771258.84
201510037016637A4A21841.8 + 8373385574.84
201611466414294A4A21841.8 + 100370102211.84
201712990015236A4A21841.8 + 114664116505.84
201815040020500A4A21841.8 + 129900131741.84
201917850028100A4A21841.8 + 150400152241.84
1841.8 + 178500180341.84

Table. 4.

Table 4. Training accuracy.


Table. 5.

Table 5. Testing accuracy.


  1. Holzmann, R, Sherburne-Benz, L, and Tesliuc, E (2003). Social Risk Management: The World Bank’s Approach to Social Protection in a Globalising World. Washington DC: The World Bank
  2. Selwaness, I (). Rethinking social insurance in Egypt: an empirical study. Giza, Egypt: The Economic Research Forum Working Paper No. 717, 2012
  3. Kassem, N (2021). Roles, rules, and controls: an analytical review of the governance of social protection in Egypt. Master’s thesis. Egypt: The American University in Cairo
  4. Loewe, M, and Westemeier, L (2018). Social insurance reforms in Egypt: needed, belated, flopped. [Online]. Available: https://dx.doi.org/10.2139/ssrn.3409190
  5. Hassani, H, Unger, S, and Beneki, C (2020). Big data and actuarial science. Big Data and Cognitive Computing. 4. article no. 40
    CrossRef
  6. Senousy, Y, Shehab, A, Hanna, WK, Riad, AM, El-bakry, HA, and Elkhamisy, N (2020). A smart social insurance big data analytics framework based on machine learning algorithms. Journal of Theoretical and Applied Information Technology. 98, 232-244.
  7. Zadeh, LA (1965). Fuzzy sets. Information and Control. 8, 338-353. https://doi.org/10.1016/S0019-9958(65)90241-X
    CrossRef
  8. Song, Q, and Chissom, BS (1993). Forecasting enrollments with fuzzy time series—Part I. Fuzzy Sets and Systems. 54, 1-9. https://doi.org/10.1016/0165-0114(93)90355-L
    CrossRef
  9. Song, Q, and Chissom, BS (1994). Forecasting enrollments with fuzzy time series—Part I. Fuzzy Sets and Systems. 62, 1-8. https://doi.org/10.1016/0165-0114(94)90067-1
    CrossRef
  10. Chen, SM (1996). Forecasting enrollments based on fuzzy time series. Fuzzy Sets and Systems. 81, 311-319. https://doi.org/10.1016/0165-0114(95)00220-0
    CrossRef
  11. Hwang, JR, Chen, SM, and Lee, CH (1998). Handling forecasting problems using fuzzy time series. Fuzzy Sets and Systems. 100, 217-228. https://doi.org/10.1016/S0165-0114(97)00121-8
    CrossRef
  12. Chen, SM, and Hwang, JR (2000). Temperature prediction using fuzzy time series. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics). 30, 263-275. https://doi.org/10.1109/3477.836375
    CrossRef
  13. Huarng, K (2001). Heuristic models of fuzzy time series for forecasting. Fuzzy Sets and Systems. 123, 369-386. https://doi.org/10.1016/S0165-0114(00)00093-2
    CrossRef
  14. Huarng, K (2001). Effective lengths of intervals to improve forecasting in fuzzy time series. Fuzzy Sets and Systems. 123, 387-394. https://doi.org/10.1016/S0165-0114(00)00057-9
    CrossRef
  15. Yu, HK (2005). A refined fuzzy time-series model for forecasting. Physica A: Statistical Mechanics and its Applications. 346, 657-681. https://doi.org/10.1016/j.physa.2004.11.006
    CrossRef
  16. Huarng, K, and Yu, HK (2005). A type 2 fuzzy time series model for stock index forecasting. Physica A: Statistical Mechanics and its Applications. 353, 445-462. https://doi.org/10.1016/j.physa.2004.11.070
    CrossRef
  17. Huarng, K, and Yu, TH (2006). Ratio-based lengths of intervals to improve fuzzy time series forecasting. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics). 36, 328-340. https://doi.org/10.1109/TSMCB.2005.857093
    CrossRef
  18. Cheng, CH, Chen, TL, and Chiang, CH (2006). Trend-weighted fuzzy time-series model for TAIEX forecasting. Neural Information Processing. Heidelberg, Germany: Springer, pp. 469-477 https://doi.org/10.1007/11893295_52
  19. Senousy, Y, Hanna, WK, Shehab, A, Riad, AM, El-Bakry, HM, and Elkhamisy, N (2019). Egyptian social insurance big data mining using supervised learning algorithms. Rev d’Intelligence Artif. 33, 349-357. https://doi.org/10.18280/ria.330504
  20. Ying, JJC, Huang, PY, Chang, CK, and Yang, DL . A preliminary study on deep learning for predicting social insurance payment behavior., Proceedings of 2017 IEEE International Conference on Big Data (Big Data), 2017, Array, pp.1866-1875. https://doi.org/10.1109/BigData.2017.8258131
  21. Wang, CH (2004). Predicting tourism demand using fuzzy time series and hybrid grey theory. Tourism Management. 25, 367-374. https://doi.org/10.1016/S0261-5177(03)00132-8
    CrossRef
  22. Muhammad, HL, Maria, EN, Hossain, JS, Nur, H, and Nur, A (2012). Fuzzy time series: an application to tourism demand forecasting. American Journal of Applied Sciences. 9, 132-140.
    CrossRef
  23. Ismail, Z, Efendi, R, and Deris, MM (2015). Application of fuzzy time series approach in electric load forecasting. New Mathematics and Natural Computation. 11, 229-248. https://doi.org/10.1142/S1793005715500076
    CrossRef
  24. Pandey, P, Kumar, S, and Srivastava, S (2013). A critical evaluation of computational methods of forecasting based on fuzzy time series. International Journal of Decision Support System Technology. 5, 24-39. https://doi.org/10.4018/jdsst.2013010102
    CrossRef
  25. Lee, WJ, Jung, HY, Yoon, JH, and Choi, SH (2017). A novel forecasting method based on F-transform and fuzzy time series. International Journal of Fuzzy Systems. 19, 1793-1802. https://doi.org/10.1007/s40815-017-0354-6
    CrossRef
  26. Song, Q, and Chissom, BS (1993). Fuzzy time series and its models. Fuzzy Sets and Systems. 54, 269-277. https://doi.org/10.1016/0165-0114(93)90372-O
    CrossRef
  27. de Lima Silva, PC, Sadaei, HJ, Ballini, R, and Guimaraes, FG (2019). Probabilistic forecasting with fuzzy time series. IEEE Transactions on Fuzzy Systems. 28, 1771-1784. https://doi.org/10.1109/TFUZZ.2019.2922152
    CrossRef
  28. GitHub. (2022) . pyFTS: fuzzy time series for python. [Online]. Available: https://github.com/PYFTS/pyFTS
  29. Chang, PT (1997). Fuzzy seasonality forecasting. Fuzzy Sets and Systems. 90, 1-10. https://doi.org/10.1016/S0165-0114(96)00138-8
    CrossRef
  30. Song, Q (1999). Seasonal forecasting in fuzzy time series. Fuzzy Sets and Systems. 107, 235-236. https://doi.org/10.1016/S0165-0114(98)00266-8
    CrossRef
  31. Alyousifi, Y, Masseran, N, and Ibrahim, K (2018). Modeling the stochastic dependence of air pollution index data. Stochastic Environmental Research and Risk Assessment. 32, 1603-1611. https://doi.org/10.1007/s00477-017-1443-7
    CrossRef
  32. Li, F (2010). Air quality prediction in Yinchuan by using neural networks. Advances in Swarm Intelligence. Heidelberg, Germany: Springer, pp. 548-557 https://doi.org/10.1007/978-3-642-13498-2_71
  33. (2003). Fuzzy environmental decision-making: applications to air pollution. Atmospheric Environment. 37, 1865-1877. https://doi.org/10.1016/S1352-2310(03)00028-1
    CrossRef

Reham Abd Elraouf is a Ph.D. candidate at the Department of Insurance and actuarial science from the Faculty of Commerce, Cairo University, Egypt. she was born in Egypt, Cairo in 1987. She received her B.Sc. and M.Sc. in Insurance, in 2008 and 2015, respectively. She works at the Faculty of Commerce, Cairo University, as an assistant lecturer. Her research interests are in Insurance, social insurance, statistical models, and Fuzzy logic.

E-mail: rehamraouf@foc.cu.edu.eg

Saad Elsaieed is a professor at Department of Insurance & Actuarial Sciences, Faculty of Commerce, Cairo University, Egypt. Professor, Insurance and actuarial science, Faculty of Commerce, Cairo University, Egypt. He received B.Sc., M.Sc., and Ph.D. degrees in Insurance from Cairo University. He also was the Vice Dean for Environmental Affairs and Community Development. Prof. Saad Elsaieed has authored more than 50 papers published in conferences and journals. His major research interest includes insurance, social insurance, and statistics.

E-mail: saad.elsaieed@hotmail.com

Article

Original Article

International Journal of Fuzzy Logic and Intelligent Systems 2022; 22(2): 169-182

Published online June 25, 2022 https://doi.org/10.5391/IJFIS.2022.22.2.169

Copyright © The Korean Institute of Intelligent Systems.

Using Fuzzy Time Series Models to Estimate the Cost of Benefits of Egyptian Social Insurance System

Reham Raouf and Saad Elsaieed

Department of Insurance & Actuarial Sciences, Faculty of Commerce, Cairo University, Cairo, Egypt

Correspondence to:Reham Raouf (rehamraouf@foc.cu.edu.eg)

Received: September 18, 2021; Revised: March 31, 2022; Accepted: April 22, 2022

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

In recent years, many methods have been proposed to forecast data in different fields based on successful fuzzy time series models (FTS). Egyptian social insurance systems (SISs) need support to optimally define and estimate yearly total benefits (pensions), which helps the actuaries who are responsible for the system make optimal decisions. Given that the total benefits have not been forecasted before by prediction methods, this paper proposes FTS models by Chen, Cheng, Yu, and Song to forecast Egyptian social insurance benefits, proposes Huarng for appropriate partition lengths, and constructs the interval length using the difference in the transformation data method, given that the data has not been stationary in recent years and has increased significantly. The proposed approach is based on experiments implemented using four models with interval length partitions of 5, 10, 50, 100, and Huarng partitions of 465. The results show great progress in the performance of yearly benefit forecasting, especially in the Chen model with a Huarng 465 partition, which has high accuracy prediction with low error when training and testing data.

Keywords: Fuzzy time series, Social insurance, Benefits, Pensions

1. Introduction

Forecasting activities play a significant part in our everyday lives; they are critical in a variety of businesses because projections of future occurrences must be factored into decision-making. Given that social insurance is one of the most significant concerns that most nations confront because of the sensitivity of its relationship with public budget impacts, forecasting insurance is an important issue, and better forecasting aids actuaries and the responsibility of making strategic choices in the social insurance system. In this context, this study proposes a prediction of the total benefits in millions (pensions) required by Egyptian insurance systems.

1.1 Social Insurance Systems

According to their conventional definition, pension schemes and social insurance systems (SISs) are one of the three primary components of social protection, together with social assistance, social safety nets, employment strategies, and labor market interventions [1]. SISs are a mechanism that allows workers to engage in securing their future, and they are partially or entirely supported by worker and employee contributions. Egypt has a broadly stratified SIS operating as a fully funded plan, where employees make payments that are invested and subsequently reimbursed as pensions. The system has gradually evolved with a defined benefit plan. The system is primarily governed by several laws, and the present legal framework, Law 148/2019, was formally established on the first of January 2020. This, in turn, plays a significant role in changing the system and overcoming challenges due to the complexity of the old social insurance regulatory and legal framework. In practice, the system faces serious challenges and is extremely fragmented with several benefit packages available to various parts of the workforce, offering payment types for employees and their families categorized by old age, disability, survivors, illness, work injury, and unemployment [2,3]. In addition, several factors have led to Egypt’s benefits being unsustainable and inefficient. For example, the reserves of pension funds are invested at low, occasionally negative, real interest rates. They provide extremely generous minimal pensions and opportunities for early retirement. Members may also easily alter the size of their pensions, which are based solely on payments made in the few years leading to retirement. Many workers agree with their employers to under-report their income throughout the majority of their working careers to reduce their payments. As a result of these factors, Egypt’s pension funds will soon spend more on pension payouts than is received via contributions from members [4]. Therefore, it should be noted that the rate of total Egypt’s benefits (pensions) in recent years is increasing sharply, forcing us to forecast the total benefits in the next few years. Annual benefits are estimated based on actuarial models and used to provide demographic and financial forecasts for pension systems. They are often generated from models applied to vocational pension schemes that cover groups of employees and are based on demographic and economic factors. Consequently, actuarial methods in connection with predictive analytics must significantly enhance their understanding of predicted behavior or events to support their policies and judgments [5]. In addition, implementing a year-by-year simulation approach to estimate the future costs of benefits is one of the most significant tasks for actuaries [6]. Therefore, benefit prediction and analysis will aid actuaries in their professional responsibility of making strategic choices in the social insurance system. To the best of our knowledge, our proposed approach is the first to predict total benefits regarding Egyptian social insurance using fuzzy time series (FTS) models with a yearly dataset (pensions); this approach leads to sufficient and outperformed results. In this study, we present the use of FTS models to forecast Egyptian social insurance benefits (pensions) rather than traditional assumption methods for better forecasting pension demand.

1.2 Fuzzy Time Series Overview

Fuzzy set theory was developed to address the ambiguity and uncertainty that characterize most real-world problems. The highly popular fuzzy set theory, proposed by Zadeh [7] in 1965, is used to explain linguistic fuzzy information through mathematical modeling and generates various findings. It is still extensively utilized in a wide range of applications. Subsequently, a number of academics proposed several models, the first of which was a FTS forecasting model. Song and Chissom [8] developed books explaining the concepts of the fuzzy set theory to overcome the challenges of classical time series. However, this model incorporates the max-min composition method, which makes the computing process substantially more difficult [9]. The Chen model also contains a series of studies on the creation and execution of experimental University of Alabama enrollment data forecasting. Chen [10] proposed a simpler calculation technique to address this flaw with the benefit of reducing computing time and making the procedure more understandable. However, for fuzzy logical relationships (FLRs), this model lacks an appropriate weight mechanism. The model has now been widely modified, with academics attempting to increase the prediction accuracy by modifying the weighting technique or increasing the duration of linguistic intervals. Hwang et al. [11] proposed a forecasting approach based on a FTS. Chen and Hwang [12] proposed a FTS-based temperature forecasting algorithm. Chen [10] improved forecasting by using a high-order FTS model. The expanded Chen’s model was used in [1], and Huarng in [13] predicted enrollments using heuristic principles, which reduced computations. Huang [14] calculated interval lengths using both average-based lengths and distribution bases. Fuzzy theory was further refined and suited by Yu and his colleague [15,16] to address weighing and recurring issues using typical FTS models, including stock price predictions. To improve forecasting accuracy, Yu [15] used multiple weighted technologies on FTS models. Then, the ratio-based interval length was added to the FTS model proposed by Huarng and Yu [17]. Cheng [18] used a FTS model combined with a trend-weighting method to forecast real stock price trading data and university enrolment.

This paper is organized as follows. In Section 2, we review related studies that implemented FTS in different domains with various datasets to extract model features. Section 3 presents the steps of the FTS models, and Section 4 illustrates the implementation of the proposed FTS model and the forming benefits dataset. In Section 5, the results and discussion of the model’s approach and accuracy in existing work are presented. Finally, Section 6 summarizes and concludes the study.

2. Related Work

Machine-learning models have contribute to SIS. The author in [19] proposed the use of three algorithms, namely the decision tree, native Bayes, and CN2 rule induction, to predict classification based on some of the social insurance features for people. The three algorithms obtained high accuracy results when classifying. The author in [20] proposed a novel deep learning-based framework based on the recurrent neural network (RNN) architecture for forecasting an individual’s payment status using a real dataset gathered by Taiwan’s Ministry of Health and Welfare. The model is being developed to effectively abstract people’s payment behavior and accurately forecast their future payment behavior over a lengthy period. Compared with state-of-the-art approaches, such as support vector machines (SVM) and hidden Markov models (HMM), the model outperforms even under varied settings. However, these studies are related to the prediction of individuals’ personal payments, which is an issue that differs from our target.

Social insurance benefits (pensions) still make traditional assumptions. In this section, we focus on related studies that implemented FTS in different domains using various datasets. There are studies related to FTS models in different industries and domains that predict enrollment, stock marketing, tourism, shipping, and transportation. We led the forced implementation of FTS to predict Egypt’s social insurance in a series of years, including the total amounts of pensions.

Although the dataset related to our proposal is represented in a series of years, the critical data must be predicted using the FTS model. The available FTS models implemented for prediction in the studies are as follows. The author in [21] proposed two models to predict the demand for tourism arrival in Taiwan as well as arrivals during the period of 1989–2000 from Hong Kong, Germany, and the United States. The FTS is more appropriate for predicting tourism demand, especially from Hong Kong arrival to Taiwan with an error of 1.93, when compared to the greedy algorithm. The author in [22] measured the prediction performance for tourist arrivals in Indonesia using a FTS compared with classical methods such as seasonal autoregressive integrated moving average (SARIMA), Box–Jenkins methods, time series regression, and Holt-Winters. The dataset was divided into training datasets from January 1989 to December 1996. The testing data were collected from January 1997 to December 1997. The root mean square error (RMSE), mean square error (MSE), mean absolute deviation (MAD), and mean absolute percentage error (MAPE) were measured. The accuracy of the resultant Chen’s FTS was outperformed by classical methods after used data transformation. Although FTS methods are simple, they outperformed all the statistical classical methods, which were used previously to predict social insurance; this prompted our proposal for using FTS methods for predictions. In [23], power companies used the proposed FTS model for electrical power to predict electrical load to make decisions. The dataset of the regional electric load in Taiwan, represented yearly, was used in this study and a comparison between FTS and previously implemented models’ linear and nonlinear regression was conducted. The accuracy of FTS outperforming in the MAPE is smaller than in other implemented models, especially when given appropriate intervals using an efficient length of discourse universe. This study adds attention to interval length, which is very important to the experimental results in the current proposal as follows. In this study [24], FTS models were analyzed for prediction as the author’s objective, using monthly data of three agro-products that represent available time-series data. Chen, Huarng, and Singh critically tested the FTS model’s production time-series data for 22 years, from 1988 to 2010. the MSE in predicting sugar production the (Chen) had the best acceptable accuracy of 63.45, whereas in the prediction of Lahi and Rice production (Singh), the best accuracies were 2605 and 913.62, respectively. The possible advantages of this study are that the varied FTS model’s predictions probably differ according to the value of the dataset, which leads to various FTS model experiments in our approach. The author in [25] enhanced FTS by reducing the prediction error and proposed a new prediction model set on the fuzzy transform (F-transform), which is established based on partitioning the universe, and the fuzzy logical relationships are utilized for prediction. This study showed that it can improve prediction accuracy by implementing the F-transform with applicable models in enrollments of the University of Alabama and many patents awarded in Taiwan, which could be the starting point in our proposed model taking dataset transformation into consideration.

FTS has multiple models implemented in different domains and varies discrimination according to the model, data set values, partition method, accuracy measure tools, and so on. This forced us to implement various experiences in our proposed model to make predictions that outperform by observing the results.

3. Fuzzy Time Series Model Steps

This paper’s proposed FTS models, Chen, Cheng, Yu, and Song, forecast Egyptian social insurance benefits, construct the interval length using methods of difference transformation data, use Huarng for appropriate partition lengths, and conduct experiments of interval length partitions (5, 10, 50, and 100). In light of the proposed study, the definition of FTS proposed by Song and Chissom [8,9,26] is based on fuzzy sets [7]. Chen [10] developed a novel approach that is more efficient than Song and Chissom’s proposed method because it employs simpler arithmetic operations rather than the difficult max-min composition operation. The steps of the Chen method are briefly reviewed in the following steps and depicted as a flowchart in Figure 1.

Step 1: Dataset

This step collects historical dataset and processing

Step 2: Divide the universe of discourse to partitions

(1) Defining and setting the universe of discourse (U) according to the available historical data time series using the following formula:

U=[Dmin-D1,Dmax+D2],

where Dmin and Dmax are the lowest and maximum number of units, respectively. D1 and D2 are positive integer numbers that are utilized to calculate and vary the length on both sides of U in order to preserve a variation space and ensure that future outcomes are contained within U.

(2) Dividing the universe of discourse U into multiple partitions of equal interval lengths. To determine the suitable length, Huarng [28] proposed an applied computed length of interval L according to the range in Table 1 as follows:

a) For the first differences, compute all the absolute differences between the values Dh – 1 and Dh, and then calculate the average of the first differences.

b) Assume that the length is half the average.

c) Using Table 1, we identified the length range and calculated the base length.

d) The length of the required L is rounded using the provided base. The number of intervals m were then calculated as follows:

m=Dmax+D2-Dmin-D1L.

Then, U can be divided into equal-length intervals U = [u1, u2, ..., un]. Assume that m intervals are equal. u1 = [d1, d2], u2 = [d2, d3], ..., um = [dm, dm+1].

Step 3: Fuzzy set

Define the fuzzy set as A(i) a collection of items having a continuous membership grade by using the universe of discourse (U) as {u1, u2, ..., un} defined by

Ai=fA1(u1)/u1+fA2(u2)/u2++fAn(un)/un,

where fAi values refer to the membership indicating that the grade of the fuzzy set has the value 0 ≤ fAi(un) ≤ 1. Determining the degree to which each value belongs to each Ai (i = 1, 2, ..., n), the time (t) considering the fuzzified time series is considered as Ai, where the maximum membership degree occurs at some point (t) by

A1=1/u1+0.5/u2+0/u3+0/u4++0/un,A2=0.5/u1+1/u2+0.5/u3+0/u4++0/un,A3=1/u1+0.5/u2+1/u3+0.5/u4++0/un,      An=0/u1+0/u2++0.5/um-2+1/um-1+0.5/un.

Step 4: Fuzzification and fuzzy logical relationship

Fuzzy variances depend on the value of the membership degree. During this step, a fuzzy set was created for each set of data. The degree to which each historical data point correlates to each Ai may be estimated if the fluctuation is within ui. In this phase, the set of fuzzy logical relationships is between the Ai(t – 1) and Aj(t) linguistic values, which are represented as AiAj for all fuzzy logical relationships. For example, in cases A1A1, A1A2, and A1A3, the fuzzy logical relationship groups are categorized and transformed into A1A1, A2, and A3 groups.

Step 5: Defuzzification and forecast value

The defuzzification principles for the fuzzified forecasted variants in Step 4 are as follows:

  • (1) If the membership of the output has just one maximum of (Ui), choose the center of the interval that corresponds to the maximum predicted value.

  • (2) If the membership of the output consists of one or more sequential maxima, use the midpoint of the relevant conjunct period as the prediction.

  • (3) If the membership of the output is 0, there is no limitation, and no change is anticipated compared to 0.

Step 6: Evaluate and performance the results

Adding the actual value of the change from the previous year to the expected degree of change yields the predicted value for the current year. The robustness of a model can be measured using statistical tools.

4. Proposed FTS Model Implementation

4.1 Dataset Description

The dataset was collected from the National Fund for Social Security and the Ministry of Insurance and Social Affairs (MOISA). All insurance data cover 43 years from 1976 to 2019; each year refers to the total benefits contained (government sector, public, and private sector) that are required from the insurance system to cover. The data are divided into training and testing datasets to evaluate the performance and compare it to other models to validate the proposed FTS model. Figure 2 shows plots of the total benefits (in millions), training data, and testing data. The dataset was divided into 80% training data from (1976 to 2009) and 20% testing data from (2010 to 2019). Table 2 shows the total benefits in one million datasets in detail.

4.2 Fitting Proposed Models with Partition

Given that the benefit values have been successively increasing over the years, the transformation method is the most suitable method for use with the present data and it is advantageous for making data stationary [27] where the partitions are based on the differences and not on the original values. Figure 3 shows the yearly differences during the 44 years, and Table 2 shows the different column differences. Then, we built the partition set based on these values.

4.3 Models Training and Testing

The suggested (Chen, Cheng, Song, Yu) model technique for utilizing insurance pension data was implemented with different partitions using the PYFTS library [28]. Figure 4 shows the group of model training and testing prediction data in the curve for all experimental models and partitions. Figure 4 contains the four models, representing Chen, Cheng, Song, and Yu, respectively.

4.4 Chen Model Implementation

Given the multitude of experimentally implemented models and partitions, we briefly explain the model’s implementation with different forms in Python FTS libraries as follows. However, we did not mention the implementation steps in all experiments to avoid redundancy in the explanation. We have already displayed all the results of the experiments of the different models with partitions and analysis in the next section. To avoid repetition, we displayed the implementation steps for one instance of the experimental results of the Chen model with partition 5 in detail.

Step 1: The historical dataset time series is collected to define and set the universe of discourse (U) based on different transformations each year, which is explained in the dataset subsection.

Step 2: Partition grid transformation universe of discourse

In the partition step, we propose using the main method grid partitions proposed in [810,29,30] with a varying number of partitions to divide the universe of discourse (U) over four models where the length of each partition is the same.

The experimental implementation used partitions 5, 10, 50, 100, and 465 (based Huarng) partitions. It establishes the best partition number for improving model accuracy [3133]. Figure 5 shows the group of partition interval lengths of the training datasets representing partitions 5, 10, 50, 100, and 465 Huarng, respectively. For example, the partition 5 triangle grids divide the universe of discourse in detail as follows:

A0:trimf([-953.9,-22.0,909.9]),A1:trimf([-22.0,909.9,1841.8),A2:trimf([909.9,1841.8,2773.7]),A3:trimf([1841.8,2773.7,3705.6]),A4:trimf([2773.7,3705.6,4637.6])

Step 3: Define the fuzzy set membership and fuzzily the variations based on the membership

In this step, we classify each value as a benefit to define belonging as fuzzified. Table 3 lists each benefit fuzzified in the fuzzified column.

Step 4: Chen fuzzy logical relationship rules

The Chen model establishes fuzzy logical relationship (FLR) rule groups based on partition 5, the fuzzy logical relationship is as follows: [‘A0A0’, ‘A0A1’, ‘A1A1’, ‘A1A2’, ‘A2A2’, ‘A2A3’, ‘A3A2’, ‘A3A4’, ‘A4A2’, ‘A2A4’].

The derived FLRs are then divided into groups based on the current benefits of FLRs. Consequently, five FLR groups were obtained, as shown in Figure 6. Table 2 illustrates each benefit with its FLR group column as follows:

FLRgroupsA0A0,A1A1A1,A2A4A2A2A2,A3,A4A3A2,A4

Step 5: Forecast value. See Table 3.

Step 6: Evaluate the performance results, which are explained in detail in the next section.

5. Result and Discussion

5.1 Measure Tools Performance and Evaluation

The five FTS models proposed that the predicted benefits evaluated using statistical measure tools were MAPE, RMSE, SMAPE, and Thiel’s U statistic the measures were chosen as the model error assessment metrics, which can assess the degree of change and accuracy of data while assessing the model’s prediction quality. The tool are defined as the RMSE in Eq. (4), MAPE in Eq. (5), MAPE (SMAPE) in Eq. (6), and Theil’s U in Eq. (7) where yn refers to the actual data, fn refers to the forecast data, and N refers to the total number of datasets.

RMSE=n=1N(yn-fn)2N,MAPE=1NnN|yn-fny^n|×100,SMAPE=1Nn=1Nyn-fn(yn+fn/2,Theils U=n=1N(yn-fn)2n=1Nyn2+n=1Nfm2.

5.2 Models Results and Accuracy

In the training dataset, accuracy was measured using four tools to show the error value in each model with each experimental partition used. Figure 7 shows the training accuracies of the models containing the four models representing the RMSE, MAPE, SMAPE, and Thiel’s coefficient, respectively. The four tools had the same accuracy rates for the models. It should be noted that the three models Chen, Cheng, and Yu start with approximate errors in partition 5 and start decreasing to have the lowest error in partition 465. Conversely, the Song model had low errors in partition 5, and the errors increased when the partitioning was increased. The three models, Chen, Cheng, and Yu, feature stability and have low errors starting at partition 50 to partition 100, and in partition 465 (based Huarng) get the same lowest error among the four tools. Table 4 shows the training accuracy, RMSE error in the range of 17.89 to 18.99, error in MAPE in the range of 0.36 to 0.37, error in SMAPE is 0.18, and error in Thiel’s U is 0. The three models, Chen, Cheng, and Yu, had identical accuracies. In contrast, in the Song model in partition 465 (based Huarng), the error in RMSE is 1540.18, the error in MAPE is 122.32, the error in SMAPE is 23.8, and the error in Theil’s U is 0.06. The Song model had extreme errors compared with the three models. Even in partition 5, the Song model had slightly more errors than the three models Chen, Cheng, and Yu. Overall, the Chen, Cheng, and Yu models have an optimal prediction in the training dataset with slight errors, particularly with partition 465.

Testing accuracy is an important factor that represents the power required to forecast the data. In the testing dataset, the accuracy was measured using four tools to show the error value in each model with each experimental partition used. Figure 8 shows the model value testing accuracies, representing the four tools RMSE, MAPE, SMAPE, and Thiel’s coefficient, respectively. It should be noted that the four tools have the same rates of accuracy for the models, which is similar to the training accuracy. Overall, the four models have approximate error values in partition 5, but the three models Chen, Cheng, and Yu decreased the errors from partition 10 and remained stationary with low errors to partitions of 465. The Song model remained stationary with high errors, even in partitions of 465. Table 5 shows the testing dataset accuracy, where the accuracies of the four models in partition 5 are too close together. The error in RMSE in range 11306 to 11314, the error in MAPE in range 10.95 to 11.98, the error in SMAPE in range 5.89 to 6.03, and the error in Thiel’s Coefficient in range 0.069 to 0.059. Then, the three models, Chen, Cheng, and Yu, have error estimates that are the same as completely after partitions 5. In parallel, the Song model had the highest error for each partition, even with slight errors. However, the ideal result was in partition 465; the three models had higher accuracy with low errors identical and in four measure tools RMSE, MAPE, SMAPE, and Thiel’s U, 11411.13, 8.87, 4.70, and 0.059, respectively. Song were 12927.04, 10.62, 5.68, and 0.067, respectively.

6. Conclusion

This paper proposed to predict yearly Egyptian total benefits (pensions) using FTS models Chen, Cheng, Song, and Yu, construct the interval length using methods of difference transformation data method, given the data has not been stationary in the last years and increasing significantly, and Huarng method for appropriate partitions length, in addition, experiments on interval length partitions 5, 10, 50, 100 for further validation of the models. Considering that we have many experiments, to avoid repetition, we sufficed to explain the Chen model implemented with partition 5 in detail. The performance of the suggested models was assessed by predicting the yearly benefits using four statistical criteria. According to the data, the proposed techniques significantly enhance the forecast performance and prediction accuracy. In particular, the Chen, Cheng, and Yu models proved to outperform, especially in training accuracy at partition 465 based Huarng, and predict with acceptable accuracy in testing data, whereas the Song model has the highest error in training and testing. The proposed model motivates the implementation of FTS to predict (pensions) side-by-side with recent learning-based algorithms in different SISs in other countries.

Fig 1.

Figure 1.

FTS model steps.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 169-182https://doi.org/10.5391/IJFIS.2022.22.2.169

Fig 2.

Figure 2.

Plots of total benefits (in millions).

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 169-182https://doi.org/10.5391/IJFIS.2022.22.2.169

Fig 3.

Figure 3.

Yearly transformation difference.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 169-182https://doi.org/10.5391/IJFIS.2022.22.2.169

Fig 4.

Figure 4.

Training and testing prediction.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 169-182https://doi.org/10.5391/IJFIS.2022.22.2.169

Fig 5.

Figure 5.

Partitions of interval lengths (training dataset): (a) partition 5, (b) partition 10, (c) partition 50, (d) partition 100, and (e) Huarng 465.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 169-182https://doi.org/10.5391/IJFIS.2022.22.2.169

Fig 6.

Figure 6.

Network FLR rules.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 169-182https://doi.org/10.5391/IJFIS.2022.22.2.169

Fig 7.

Figure 7.

Training accuracy of the four models: (a) RMSE, (b) MAPE, (c) SMAPE, and (d) Thiel’s coefficient.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 169-182https://doi.org/10.5391/IJFIS.2022.22.2.169

Fig 8.

Figure 8.

Testing accuracy of the four models: (a) RMSE, (b) MAPE, (c) SMAPE, and (d) Thiel’s coefficient.

The International Journal of Fuzzy Logic and Intelligent Systems 2022; 22: 169-182https://doi.org/10.5391/IJFIS.2022.22.2.169

Table 1 . Range of interval.

RangeBase
0.1–1.00.1
1.1–101
11–10010
101–1000100

Table 2 . Total benefits dataset.

YearGovernment sectorPublic & private sectorTotal benefits in millionsDifferenceYearGovernment sectorPublic & private sectorTotal benefits in millionsDifference
19761215717901977145401856
1978162512142919791956826450
1980243105348841981207121328−20
19824013517534251983436424861108
1984557485104218119856415531194152
1986779631141021619878416881529119
198810138231837308198912759692244407
19901417113325513071991171413863100549
19922037170537436431993235320844438695
1994301925625582114419953587300065871005
199641073469757799019974810409089011324
199852525003102551354199960345845118801625
200066896797134871607200177427563153051818
2002959083141790425992003107918996197871818
200412240975621996181820051398810590245782599
2006154411286528306372820071686713398302651959
2008193111517034481421620091948818139376273146
2010226601845641116348920112872422124508489732
20123556828164637321288420133364035777694175685
2014405584317583733143162015485545181610037016637
20165549559169114664142942017623006760012990015236
20187280077600150400205002019857009280017850028100

Table 3 . Dataset training fuzzyfied and FLR group rules.

YearBenefitsDifferenceFuzzyfiedFLR GroupDefuzzificationForecasting value
19761790A0A0,A1——0
19771856A0A0,A1[(−22.0) + (909.9)]/2 + 179622.96
197821429A0A0,A1[(−22.0) + (909.9)]/2 + 185628.96
197926450A0A0,A1[(−22.0) + (909.9)]/2 + 214657.96
198034884A0A0,A1[(−22.0) + (909.9)]/2 + 264707.96
1981328−20A0A0,A1[(−22.0) + (909.9)]/2 + 348791.96
1982753425A0A0,A1[(−22.0) + (909.9)]/2 + 328771.96
1983861108A0A0,A1[(−22.0) + (909.9)]/2 + 7531196.96
19841042181A0A0,A1[(−22.0) + (909.9)]/2 + 8611304.96
19851194152A0A0,A1[(−22.0) + (909.9)]/2 + 10421485.96
19861410216A0A0,A1[(−22.0) + (909.9)]/2 + 11941637.96
19871529119A0A0,A1[(−22.0) + (909.9)]/2 + 14101853.96
19881837308A0A0,A1[(−22.0) + (909.9)]/2 + 15291972.96
19892244407A0A0,A1[(−22.0) + (909.9)]/2 + 18372280.96
19902551307A0A0,A1[(−22.0) + (909.9)]/2 + 22442687.96
19913100549A1A1,A2[(−22.0) + (909.9)]/2 + 25512994.96
19923743643A1A1,A2[(909.9) + (1841.8)]/2 + 31004475.88
19934438695A1A1,A2[(909.9) + (1841.8)]/2 + 37435118.88
199455821144A1A1,A2[(909.9) + (1841.8)]/2 + 44385813.88
199565871005A1A1,A2[(909.9) + (1841.8)]/2 + 55826957.88
19967577990A1A1,A2[(909.9) + (1841.8)]/2 + 65877962.88
199789011324A1A1,A2[(909.9) + (1841.8)]/2 + 75778952.88
1998102551354A1A1,A2[(909.9) + (1841.8)]/2 + 890110276.88
1999118801625A2A2, A3, A4[(909.9) + (1841.8)]/2 + 1025511630.88
2000134871607A2A2, A3, A4[(1841.8) + (2773.7) + (3705.6)]/3 + 1188014653.76
2001153051818A2A2, A3, A4[(1841.8) + (2773.7) + (3705.6)]/3 + 1348716260.76
2002179042599A3A2, A4[(1841.8) + (2773.7) + (3705.6)]/3 + 1530518078.76
2001153051818A2A2, A3, A4[(1841.8) + (3705.6)]/2 + 1790420677.76
2001153051818A2A2, A3, A4[(1841.8) + (2773.7) + (3705.6)]/3 + 1530522560.76
2002179042599A3A2, A4[(1841.8) + (2773.7) + (3705.6)]/3 + 1530524769.76
2006283063728A4A2[(1841.8) + (3705.6)]/2 + 1790427351.76
2007302651959A2A2, A3, A41841.8 + 2830630147.84
2008344814216A4A2[(1841.8) + (2773.7) + (3705.6)]/3 + 3026533038.76
2009376273146A4A21841.8 + 3448136322.84
2010411163489A4A21841.8 + 3762740400.76
2011508489732A4A21841.8 + 4111642957.84
20126373212884A4A21841.8 + 5084852689.84
2013694175685A4A21841.8 + 6373265573.84
20148373314316A4A21841.8 + 6941771258.84
201510037016637A4A21841.8 + 8373385574.84
201611466414294A4A21841.8 + 100370102211.84
201712990015236A4A21841.8 + 114664116505.84
201815040020500A4A21841.8 + 129900131741.84
201917850028100A4A21841.8 + 150400152241.84
1841.8 + 178500180341.84

Table 4 . Training accuracy.


Table 5 . Testing accuracy.


References

  1. Holzmann, R, Sherburne-Benz, L, and Tesliuc, E (2003). Social Risk Management: The World Bank’s Approach to Social Protection in a Globalising World. Washington DC: The World Bank
  2. Selwaness, I (). Rethinking social insurance in Egypt: an empirical study. Giza, Egypt: The Economic Research Forum Working Paper No. 717, 2012
  3. Kassem, N (2021). Roles, rules, and controls: an analytical review of the governance of social protection in Egypt. Master’s thesis. Egypt: The American University in Cairo
  4. Loewe, M, and Westemeier, L (2018). Social insurance reforms in Egypt: needed, belated, flopped. [Online]. Available: https://dx.doi.org/10.2139/ssrn.3409190
  5. Hassani, H, Unger, S, and Beneki, C (2020). Big data and actuarial science. Big Data and Cognitive Computing. 4. article no. 40
    CrossRef
  6. Senousy, Y, Shehab, A, Hanna, WK, Riad, AM, El-bakry, HA, and Elkhamisy, N (2020). A smart social insurance big data analytics framework based on machine learning algorithms. Journal of Theoretical and Applied Information Technology. 98, 232-244.
  7. Zadeh, LA (1965). Fuzzy sets. Information and Control. 8, 338-353. https://doi.org/10.1016/S0019-9958(65)90241-X
    CrossRef
  8. Song, Q, and Chissom, BS (1993). Forecasting enrollments with fuzzy time series—Part I. Fuzzy Sets and Systems. 54, 1-9. https://doi.org/10.1016/0165-0114(93)90355-L
    CrossRef
  9. Song, Q, and Chissom, BS (1994). Forecasting enrollments with fuzzy time series—Part I. Fuzzy Sets and Systems. 62, 1-8. https://doi.org/10.1016/0165-0114(94)90067-1
    CrossRef
  10. Chen, SM (1996). Forecasting enrollments based on fuzzy time series. Fuzzy Sets and Systems. 81, 311-319. https://doi.org/10.1016/0165-0114(95)00220-0
    CrossRef
  11. Hwang, JR, Chen, SM, and Lee, CH (1998). Handling forecasting problems using fuzzy time series. Fuzzy Sets and Systems. 100, 217-228. https://doi.org/10.1016/S0165-0114(97)00121-8
    CrossRef
  12. Chen, SM, and Hwang, JR (2000). Temperature prediction using fuzzy time series. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics). 30, 263-275. https://doi.org/10.1109/3477.836375
    CrossRef
  13. Huarng, K (2001). Heuristic models of fuzzy time series for forecasting. Fuzzy Sets and Systems. 123, 369-386. https://doi.org/10.1016/S0165-0114(00)00093-2
    CrossRef
  14. Huarng, K (2001). Effective lengths of intervals to improve forecasting in fuzzy time series. Fuzzy Sets and Systems. 123, 387-394. https://doi.org/10.1016/S0165-0114(00)00057-9
    CrossRef
  15. Yu, HK (2005). A refined fuzzy time-series model for forecasting. Physica A: Statistical Mechanics and its Applications. 346, 657-681. https://doi.org/10.1016/j.physa.2004.11.006
    CrossRef
  16. Huarng, K, and Yu, HK (2005). A type 2 fuzzy time series model for stock index forecasting. Physica A: Statistical Mechanics and its Applications. 353, 445-462. https://doi.org/10.1016/j.physa.2004.11.070
    CrossRef
  17. Huarng, K, and Yu, TH (2006). Ratio-based lengths of intervals to improve fuzzy time series forecasting. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics). 36, 328-340. https://doi.org/10.1109/TSMCB.2005.857093
    CrossRef
  18. Cheng, CH, Chen, TL, and Chiang, CH (2006). Trend-weighted fuzzy time-series model for TAIEX forecasting. Neural Information Processing. Heidelberg, Germany: Springer, pp. 469-477 https://doi.org/10.1007/11893295_52
  19. Senousy, Y, Hanna, WK, Shehab, A, Riad, AM, El-Bakry, HM, and Elkhamisy, N (2019). Egyptian social insurance big data mining using supervised learning algorithms. Rev d’Intelligence Artif. 33, 349-357. https://doi.org/10.18280/ria.330504
  20. Ying, JJC, Huang, PY, Chang, CK, and Yang, DL . A preliminary study on deep learning for predicting social insurance payment behavior., Proceedings of 2017 IEEE International Conference on Big Data (Big Data), 2017, Array, pp.1866-1875. https://doi.org/10.1109/BigData.2017.8258131
  21. Wang, CH (2004). Predicting tourism demand using fuzzy time series and hybrid grey theory. Tourism Management. 25, 367-374. https://doi.org/10.1016/S0261-5177(03)00132-8
    CrossRef
  22. Muhammad, HL, Maria, EN, Hossain, JS, Nur, H, and Nur, A (2012). Fuzzy time series: an application to tourism demand forecasting. American Journal of Applied Sciences. 9, 132-140.
    CrossRef
  23. Ismail, Z, Efendi, R, and Deris, MM (2015). Application of fuzzy time series approach in electric load forecasting. New Mathematics and Natural Computation. 11, 229-248. https://doi.org/10.1142/S1793005715500076
    CrossRef
  24. Pandey, P, Kumar, S, and Srivastava, S (2013). A critical evaluation of computational methods of forecasting based on fuzzy time series. International Journal of Decision Support System Technology. 5, 24-39. https://doi.org/10.4018/jdsst.2013010102
    CrossRef
  25. Lee, WJ, Jung, HY, Yoon, JH, and Choi, SH (2017). A novel forecasting method based on F-transform and fuzzy time series. International Journal of Fuzzy Systems. 19, 1793-1802. https://doi.org/10.1007/s40815-017-0354-6
    CrossRef
  26. Song, Q, and Chissom, BS (1993). Fuzzy time series and its models. Fuzzy Sets and Systems. 54, 269-277. https://doi.org/10.1016/0165-0114(93)90372-O
    CrossRef
  27. de Lima Silva, PC, Sadaei, HJ, Ballini, R, and Guimaraes, FG (2019). Probabilistic forecasting with fuzzy time series. IEEE Transactions on Fuzzy Systems. 28, 1771-1784. https://doi.org/10.1109/TFUZZ.2019.2922152
    CrossRef
  28. GitHub. (2022) . pyFTS: fuzzy time series for python. [Online]. Available: https://github.com/PYFTS/pyFTS
  29. Chang, PT (1997). Fuzzy seasonality forecasting. Fuzzy Sets and Systems. 90, 1-10. https://doi.org/10.1016/S0165-0114(96)00138-8
    CrossRef
  30. Song, Q (1999). Seasonal forecasting in fuzzy time series. Fuzzy Sets and Systems. 107, 235-236. https://doi.org/10.1016/S0165-0114(98)00266-8
    CrossRef
  31. Alyousifi, Y, Masseran, N, and Ibrahim, K (2018). Modeling the stochastic dependence of air pollution index data. Stochastic Environmental Research and Risk Assessment. 32, 1603-1611. https://doi.org/10.1007/s00477-017-1443-7
    CrossRef
  32. Li, F (2010). Air quality prediction in Yinchuan by using neural networks. Advances in Swarm Intelligence. Heidelberg, Germany: Springer, pp. 548-557 https://doi.org/10.1007/978-3-642-13498-2_71
  33. (2003). Fuzzy environmental decision-making: applications to air pollution. Atmospheric Environment. 37, 1865-1877. https://doi.org/10.1016/S1352-2310(03)00028-1
    CrossRef