search for


Data Stream Classification Algorithms for Workload Orchestration in Vehicular Edge Computing: A Comparative Evaluation
International Journal of Fuzzy Logic and Intelligent Systems 2021;21(2):101-122
Published online June 25, 2021
© 2021 Korean Institute of Intelligent Systems.

Mutaz Al-Tarawneh

Computer Engineering Department, Mutah University, Karak, Jordan
Correspondence to: Mutaz Al-Tarawneh (
Received February 3, 2021; Revised March 28, 2021; Accepted April 26, 2021.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
This paper reports on the use of online data stream classification algorithms to support workload orchestration in vehicular edge computing environments. These algorithms can be used to predict the ability of available computational nodes to successfully handle computational tasks generated from vehicular applications. Several online data stream classification algorithms have been evaluated based on synthetic datasets generated from simulated vehicular edge computing environments. In addition, a multi-criteria decision analysis technique was utilized to rank the different algorithms based on their performance metrics. The evaluation results demonstrate that the considered algorithms can handle online classification operations with various trade-offs and dominance relations with respect to their obtained performance. In addition, the utilized multi-criteria decision analysis technique can efficiently rank various algorithms and identify the most appropriate algorithms to augment workload orchestration. Furthermore, the evaluation results show that the leveraging bagging algorithm, with an extremely fast decision tree base estimator, is able to maintain marked online classification performance and persistent competitive ranking among its counterparts for all datasets. Hence, it can be considered a promising choice to reinforce workload orchestration in vehicular edge computing environments.
Keywords : Online classification, Data stream, Performance, Ranking
1. Introduction

In addition to the rapid development of artificial intelligence and data mining techniques, Internet of Things (IoT) technology has transformed traditional transportation systems into intelligent transportation systems (ITS) [1]. In the ITS paradigm, vehicles are equipped with various types of processing, sensing, detection, and communication devices, leading to more intelligent and connected vehicles. Therefore, a plethora of intelligent applications, such as autonomous driving, smart navigation, and infotainment, has been developed [24]. However, these applications are computationally intensive, with processing requirements that surpass the processing capacity of in-vehicle hardware resources. This discrepancy between the application requirements and the computing power of the in-vehicle hardware resources can hinder the development of ITS-centered applications [5]. To remedy this situation, cloud computing has been employed within vehicular networks, and applications have been allowed to utilize the powerful hardware resources of remote cloud data centers [6]. However, cloud data centers may incur very long transmission delays that violate the extremely low latency requirements of some applications in vehicular networks. To overcome this issue, vehicular edge computing (VEC), which combines vehicular networks with multi-access edge computing (MEC), has been proposed [7]. In the VEC paradigm, several edge servers are deployed at the network edge, providing adequate computing power to handle the increasing processing requirements of in-vehicle applications.

Hence, VEC is considered a promising concept that provides a baseline for various applications to enhance the quality of service (QoS) based on computation offloading [8]. It can improve the user’s quality of experience by offloading computational tasks to the edge servers, which are expected to handle the processing demand of vehicular applications. Thus, it can handle delay-sensitive vehicular applications [9,10]. The envisioned VEC scenarios encompass a dynamically changing process characterized by an evolving stream of offloading requests generated from a diverse set of applications with varying processing demands and delay requirements, an instantaneously changing network state, and an oscillatory utilization and capacity of the hardware resources of both edge and cloud servers [11]. In such a dynamic and heterogeneous environment, workload orchestration plays a vital role in handling the incoming offloading requests generated from different applications and deciding on the computational server to which each request should be offloaded, considering the application characteristics, network status, and utilization levels of both edge and cloud hardware resources. Apparently, as the number of vehicles and in-vehicle applications increases, the demand for the networking and hardware resources of the VEC system will also increase. The increased demand for these resources will ultimately lead to increased competition for resources, hindering the decision-making process involved in the operation of workload orchestration [12]. Consequently, predicting the status of hardware resources and their ability to successfully handle offloaded requests is a challenging yet inevitable process. In this regard, machine learning (ML) algorithms can provide efficient tools for making predictions in dynamically changing environments [13]. The ultimate goal of the prediction process is to help the workload orchestrator determine whether offloading an incoming request to a particular server will eventually succeed or fail. In other words, the prediction process associated with workload orchestration is a binary classification problem in which a class label is assigned to each incoming offloading request. The class label will be set to either success or failure. Whereas a success indicates that a particular computational server is predicted to successfully handle the offloaded request and deliver the execution results back to the request source, a failure indicates that the offloading process is expected to fail because of either vehicle mobility, network congestion, or unavailability of processing resources. In this context, the work in [13] is among the first efforts to propose an ML-based workload orchestrator for VEC systems. They used several classification algorithms and tested their ability to aid in the workload orchestration process. However, they used these algorithms in a batch-learning scheme. In this scheme, an ML model is built under the assumption that the entire dataset (i.e., offloading requests) is available in memory. However, batch learning schemes suffer from several limitations. First, the training phase may last for a long period and require a significant amount of processing and memory resources. Second, the performance of the trained model is sensitive to the training dataset size. Third, in the batch learning scheme, the training data are assumed to be static and do not change over time; once a model is trained, it cannot acquire knowledge or experience from new samples. In other words, when there is a change in the statistical properties of the model’s input (i.e., concept drift), a new model must be constructed [14]. Consequently, as the offloading requests in VEC environments represent a continuous stream of data, online data stream classification algorithms are viable alternatives to offline (i.e., batch) classification algorithms in VEC-based workload orchestration for several reasons. First, online data stream classification algorithms are designed to handle unbounded streams of data and incrementally learn from incoming data; they must be continuously updated while making predictions when required [15]. Second, online data stream classification algorithms can be utilized in real-time applications that cannot be handled by classical (i.e., batch) learning algorithms [16]. Third, they are able to detect concept drift, allowing the trained models to continuously adapt and evolve with dynamically changing data streams [17,18]. Therefore, online data stream classification is perceived as a suitable tool for workload orchestration in VEC systems, where the environmental state and application behavior may change over time in an unpredictable manner. Many algorithms for data stream classification have been proposed [1921]. Although some of these algorithms have been studied in multiple fields [16,2225], the evaluation of their performance and the trade-offs involved in their performance metrics and memory costs have not been addressed for VEC environments. Hence, this work seeks to identify which online data stream classification algorithms may be suitable for VEC environments through rigorous comparative evaluation. In addition, it aims to analyze the trade-offs associated with key performance metrics, such as the algorithm’s accuracy, precision, recall, kappa statistic, and memory requirements [26]. Typically, the evaluation of online data stream classification algorithms involves multiple, typically conflicting criteria. Hence, it can be modeled as a multi-criteria decision analysis (MCDA) problem [27]. MCDA is a formal process that can be employed in decision-making by allowing identification of the alternatives, selection of the evaluation indicators (i.e., criteria), and aggregation of the obtained performance scores under the considered performance evaluation criteria [28,29]. In this context, the composite indicators (CIs) are a family of MCDA methods that can be used to assign scores to various alternatives and rank them appropriately [3032]. Hence, this work employs a CI-based approach to rank various online data stream classification algorithms based on their obtained performance metrics. The experiments were carried out using several online data stream classification algorithms (Section 2) based on synthetically generated datasets. All tests were performed using a scikit-multiflow evaluation platform [33]. The main contributions of this study can be summarized as follows.

  1. • Performing a rigorous evaluation of the performance of several online data stream classification algorithms on synthetic datasets generated from a simulated VEC environment.

  2. • Applying a MCDA technique to rank various algorithms based on their observed performance trade-offs and dominance relations.

  3. • Identifying the most suitable algorithm or set of algorithms to augment workload orchestration in VEC environments.

The remainder of this paper is organized as follows. Section 2 briefly explains the online data stream classification algorithms considered in this study. Section 3 describes the research methodology and utilizes research tools. Section 4 presents the results of the evaluation and trade-off analysis. Section 5 summarizes and concludes this paper.

2. Data Stream Classification

Traditionally, ML algorithms have been used to construct models based on static datasets. However, there is a growing need for mechanisms capable of handling vast volumes of data that arrive as streams. In this regard, new data samples can be obtained at any time, and storing these data samples is inappropriate. On the one hand, learning from continuous data streams requires the ML model to be created and continuously updated throughout the stream. On the other hand, it is important to address concept drift, in which the statistical properties of the incoming data change over time [34,35]. In addition, for VEC environments, the obtained ML model must be updated instantaneously, requiring algorithms that can achieve adequate accuracy levels under limited processing power and memory space [26]. In this work, several data stream classification algorithms, with distinct model construction mechanisms, complexity, and memory requirements, are considered. The algorithms are summarized as follows.

2.1 Bayes Learning Algorithms

In this category, the naive Bayes (NB) algorithm was used [20]. The NB algorithm performs Bayesian prediction under the assumption that all inputs, that is, features of the input data samples, are independent. The NB algorithm is a simple classification algorithm with low processing requirements. Given n different classes, the trained NB classifier predicts the class to which it belongs with a high probability for every incoming data sample.

2.2 Lazy Learning Algorithms

The most commonly used lazy data stream classification algorithms are the k-nearest neighbors classifier (kNN) [36], kNN classifier with adaptive windowing (kNN-ADWIN) change detector,, and self-adjusting memory coupled with the kNN classifier (SAM-kNN) [37,38]. In the data stream setting, the kNN algorithm operates by keeping track of a window with a fixed number of recently observed training data samples. When a new input data sample arrives, the kNN algorithm searches within the recently stored samples and finds the closest neighbors using a particular distance measure. Then, the class label of the incoming data sample can be assigned accordingly. The kNN-ADWIN classifier is an improvement over the regular kNN classifier because of its resistance to concept drift. It employs the ADWIN change detector to determine which previous data samples to keep and which ones to discard (i.e., forget), which in turn regulates the sample window size. In addition, the SAM-kNN is a refinement over the other two algorithms, in which a SAM model constructs an ensemble of models targeting either current or previous concepts. These dedicated models can be applied according to the requirements of the current situation (i.e., concept). To accomplish this, the SAM model constructs both short-term (STM) and long-term (LTM) memories. Whereas the STM is built for the current concept, the LTM is used to store information about previous concepts. A cleaning process is then used to control the size of the STM while maintaining consistency between the LTM and STM.

2.3 Tree-Based Learning Algorithms

Tree-based algorithms are widely used in data-stream classification applications. In this study, three main tree-based algorithms are used: the Hoeffding tree (HT) [39], Hoeffding adaptive tree (HAT) [40], and extremely fast decision tree (EFDT) [41]. The HT algorithm is an incremental, anytime decision tree induction algorithm that has the ability to learn from massive online data streams, assuming that the distribution generating the incoming data samples is static and does not change over time. It relies on the fact that a small set of input samples can often be sufficient to select an optimal splitting attribute. This idea is mathematically supported by the Hoeffding bound, which quantifies the number of input samples required to estimate some statistics within a predefined precision. A theoretically attractive feature of the HT algorithm that is not shared by other online decision tree algorithms is that it has profound performance guarantees. Relying on the Hoeffding bound, the output of an HT learner is asymptotically nearly identical to that of a batch-based learner using infinitely many input data samples. The HAT algorithm utilizes the ADWIN change detector to monitor the performance of different tree branches. The branches whose performance decreases are replaced with new branches if the new branches are more accurate. Furthermore, the EFDT classification algorithm builds a tree in an incremental manner. It selects and deploys a split once it is confident that the split is useful and subsequently revisits that split decision, replacing it if it then becomes evident that a more useful split is present. The EFDT is able to learn quickly from static distributions and ultimately learn the asymptotic batch tree if the distribution generating the input data samples is stationary.

2.4 Rule-Based Learning Algorithms

The very fast decision rule (VFDR) is an online data stream classification algorithm [42]. The learning process of the VFDR algorithm is similar to that of the HT algorithm, but instead utilizes a collection of rules instead of a tree. The ultimate goal of VFDR is to create a collection of rules that constitute a highly interpretable classifier. Each rule is represented as a combination of conditions based on attribute values and a structure for maintaining sufficient statistics. These statistics are used to determine the class predicted by the associated rule for incoming data samples.

2.5 Ensemble Learning Algorithms

In this study, several ensemble learning methods are evaluated. These algorithms include Oza bagging (OB) [43], Oza bagging with ADWIN change detector (OB-ADWIN) [43], leveraging bagging (LB) [44], online synthetic minority oversampling technique bagging (SMOTE-B) [45], online under-over bagging (UO-B) [45], adaptive random forest (ARF) [46], and the dynamic weighted majority classifier (DWMC) [47]. For instance, OB is an online ensemble learning algorithm that can be considered a refinement over traditional batch-based bagging to handle incremental learning. For traditional batch-based bagging, M classifiers are trained on M distinct datasets created by drawing N samples from an N-sized training dataset with replacement. In the online learning settings, as there is no training dataset, but rather a stream of input data samples, the drawing of input samples with replacements cannot be trivially performed. The online OB mimics the training phase by using each arriving input data sample to train the base estimator over k times, which is drawn by the binomial distribution. Because the input data stream can be assumed to be infinite, and given that the binomial distribution tends to a Poisson (λ =1) distribution with infinite samples, [43] found the process adopted by the online OB algorithm to be a good ‘drawing with replacement’. OB-ADWIN is an improvement from the OB algorithm, where the ADWIN change detector is employed. In addition, the LB algorithm is based on the OB algorithm, in which a Poisson distribution is used to simulate the re-sampling process. The LB algorithm attempts to obtain better classification results by modifying the parameters of the Poisson distribution obtained from the binomial distribution when assuming an infinite input data stream. Hence, the LB algorithm changes the λ parameter of the Poisson distribution to six instead of one. The new value of λ would lead to greater diversity in the input space by attributing a different range of weights to the input data samples. To achieve further improvement over the OB algorithm, the LB uses output detection codes. In the detection codes, each class label is coded using an n-bit-long binary code, and each bit is associated with one of the n classifiers. As a new input data sample is analyzed, each classifier is trained on its associated bit. This assists the LB algorithm, to some extent, with error correction.

The ARF algorithm is an adaptation of the traditional batch-based random forest algorithm applied to an online data stream scope. In ARF, multiple decision trees are generated, and the decision on how to classify each incoming data sample is made through a weighted voting system. In the voting process, the decision tree with the best performance (in terms of either accuracy or the kappa statistic) receives a more substantial weight in the voting process, that is, a higher priority in the classification decision. The DWMC algorithm relies on four mechanisms to handle concept drift: it first trains online base learners of the ensemble; it then assigns weights to the base learners depending on their performance, removes them based on their performance, and adds new learners based on the global performance of the whole ensemble.

In the UO-B and SMOTE-B algorithms, the basic idea is to use existing algorithms for online ensemble-based learning and combine them with batch-mode techniques for cost-sensitive bagging. Such a combination is useful for classifying imbalanced data streams in which the vast majority of incoming data samples belong to the same class. The UO-B and SMOTE-B algorithms represent an adaptation of the UnderOverBagging and SMTEBagging ensemble methods [48] for online data stream settings. Given an imbalanced dataset with N samples, where N+ samples come from the minority class S+ and N samples from the majority class S, batch bagging-based learning algorithms rely on under-sampling (underbagging) the majority class, over-sampling (overbagging) the minority class, or UnderOverBagging, which is a uniform combination of both underbagging and overbagging. Furthermore, the re-sampling rate (a) varies over the bagging iterations, boosting the diversity among the base learners. In SMOTEBagging, the majority class is sampled with replacement at a rate of 100% (i.e., N from the majority class is generated), while CN+ samples from the minority class are generated for each base learner, for some C > 1, among which a% is generated by re-sampling, while the rest of the samples are obtained by the SMOTE technique [49]. In the online data stream settings, where no dataset is available, the sampling process is simulated by presenting each incoming data sample to the base model over k times, where k is sampled from a Poisson (λ) distribution. The value of the parameter λ is chosen in a manner that allows an online bagging scheme to mimic the behavior of its batch-based counterpart.

3. Research Tools and Methodology

3.1 VEC System Model

Figure 1 shows the system model assumed in this work. This model is based on the work proposed in [13,50], where a model and a simulation environment for a VEC system were proposed. In this model, in-vehicle applications are assumed to periodically generate offloading requests for some computational tasks. In this work, the workload orchestrator is assumed to offload the current request (i.e., task) to either: (A) the road side unit (RSU) via the wireless local area network (WLAN) that currently covers the offloading vehicle, (B) the cloud server through the RSU - using the wide area network (WAN), or (C) the cloud server via the cellular network (CN).

It is assumed that there are three main in-vehicle application types. These applications and their corresponding tasks are uniquely characterized by a set of parameters, as shown in Table 1. The usage percentage is the percentage of vehicles running each application, task input file size is the mean size of the input data required by the offloaded task, task output file size is the average size of the output result generated after executing the offloaded task, virtual machine (VM) utilization indicates the percentage of the processing power consumed (i.e., utilized) by the offloaded task, and the number of instructions per task represents the expected processing demand of the offloaded task. As shown in Table 1, the characteristics of different applications were chosen such that adequate diversification in their task offloading frequency, processing demand, and network bandwidth requirements is achieved. Such a diverse set of applications will cause the simulated VEC environment to go through various states that cover a wide range of computational server loading and network bandwidth utilization levels.

3.2 Dataset Construction

To carry out a comparative evaluation between different data stream classification algorithms, synthetic datasets were generated based on the simulation tool of [13]. The datasets were created by simulating a VEC system with 2,400 vehicles and application characteristics, as shown in Table 1. In addition, the VEC simulation was accompanied by a hypothetical workload orchestrator that offloads the incoming requests according to Table 2. Each entry in Table 2 represents the probability that the orchestrator will make a particular decision for an incoming offloading request. As shown, these probabilities are application-dependent; for instance, there is a 0.60 probability of offloading tasks generated from App-1 to the VEC server, as compared with probabilities of 0.30 and 0.23 for App-2 and App-3, respectively. These offloading probabilities were set to mimic a relatively realistic situation that aligns with the application characteristics shown in Table 1. For instance, both App-2 and App-3 are more computationally intensive than App-1; they have a greater number of instructions per task. In addition, the average input/output file size is higher than that of App-1. Hence, they were assigned higher probabilities of being offloaded to the resource-enriched cloud server via the high-bandwidth WAN connection. Overall, the hypothetical orchestration behavior shown in Table 2 was observed to generate synthetic datasets with an adequate number of success and failure instances, leading to a more appropriate setting for evaluating the performance and discrimination ability of different online data stream classification algorithms. For each offloading request, a record-keeping process was performed. Record keeping involves recording the VEC environmental state when the offloading decision is made, in addition to the offloading outcome. The environmental state variables depend on the decision made by the workload orchestrator, as shown in Table 3. The environmental state variables contain information about application characteristics related to the offloaded task, average utilization of the VMs instantiated on the VEC server, upload and download delays associated with network connections used for task offloading, and the number of tasks recently offloaded to the selected server.

The offloading outcome is set to success if the task associated with the offloaded request is offloaded successfully and its result is delivered to the request source. It is set to failure if the execution result is not delivered successfully to the requesting vehicle. The failure of the offloading process may be due to several reasons, such as high resource utilization of the selected server, unavailability of network bandwidth, or vehicle mobility. Ultimately, the records obtained during the simulation process were used to create three different datasets, with each dataset corresponding to one of the three possible offloading decisions. These datasets are denoted as the Edge, CloudRSU, and CloudCN datasets. Each dataset stores the environmental state variables associated with each offloading request, along with the offloading outcome. The stored entries appear in chronological order. Table 4 lists some sample entries in the constructed datasets.

3.3 Evaluation Methodology

This section explains the main steps implemented to evaluate the performance of the considered data stream classification algorithms on the obtained datasets. Figure 2 depicts the evaluation procedure followed by the scikit-multiflow evaluation platform [33].

This evaluation procedure was performed for every data steam classification algorithm for each of the obtained datasets. As shown in Figure 2, the dataset is first loaded as a stream and then passed to the instantiated classification algorithm for online testing, incremental learning, and evaluation. In the evaluation platform, the instances contained in the obtained stream maintain their chronological order captured from the simulated VEC environment. In addition, each instance contains a copy of the dynamic state (i.e., application characteristics, designated server utilization, and network status) of the VEC environment upon making the orchestration decision. As a result, each tested algorithm in the evaluation platform is exposed to state variables that capture the state of the simulated VEC environment when making a classification decision. In this study, the classification algorithms were evaluated using a prequential evaluation method or the interleaved test-then-train method. The prequential evaluation method is designed specifically for online data stream settings, in the sense that each incoming input data sample (i.e., instance) serves two purposes, and those input instances are analyzed sequentially, in order of their arrival, and become immediately inaccessible. In this method, each incoming data sample is first used to test the classification algorithm (i.e., to make a prediction), after which the same input sample is used to train the classification algorithm. The considered performance metrics are incrementally updated after each observed instance, allowing a continuous update of the performance of each tested algorithm in addition to real-time tracking of its adaptability to unseen instances. Hence, the classification algorithm is always tested, and its performance metrics are updated on input data samples that have not yet been seen. The performance of the classification algorithms is quantified in terms of a number of widely used performance measures, including accuracy, precision, recall, F-score, kappa statistic, and memory size of the ML model obtained online. These measures are defined as follows:

  1. Classification accuracy: the percentage of correctly classified input samples.


    where TP, TN, FP, and FN denote true positive, true negative, false positive, and false negative, respectively. TP is the number of correctly classified positive (i.e., successful) input instances. TN is the number of correctly classified negative instances (i.e., failure). FP is the number of positive instances that are misclassified as negative. FN is the number of negative instances that are misclassified as positive.

  2. Precision: measures the number of positive class predictions that actually belong to the positive class.


  3. Recall: quantifies the number of positive class predictions made out of all positive instances in the observed stream.


  4. F-score: is the harmonic mean of precision and recall.


  5. Kappa statistic (κ): is a robust classification accuracy measure that takes into account the probability of agreement by chance, indicating an improvement over a majority class classifier, which predicts all incoming samples fall in the majority class [51]. It plays a crucial role in evaluating classification accuracy, especially for imbalanced data streams.


    where p0 is the prequential accuracy of the classifier and pc is the probability that a chance classifier makes a correct prediction [52]. If κ = 1, the classification algorithm is always correct.

  6. Model size: is a measure of the memory space occupied by the classification model obtained throughout the prequential evaluation process.

Typically, the online classification of data streams does not have a single optimal solution (i.e., algorithm) that optimizes all of the involved performance metrics, but rather a plethora of possible solutions with noticeable trade-offs between different performance metrics. For such scenarios, the notion of optimality is captured through Pareto Optimalty, which considers an algorithm to be inferior or superior to another algorithm only if it is inferior in all metrics or superior in all metrics. The trade-offs in the algorithm selection process are captured by algorithms that are superior with respect to some metrics but inferior in other metrics. Such pairs of algorithms that are both superior and inferior with respect to certain metrics are called non-dominated and form the Pareto front of the algorithm performance optimization problem [53]. Hence, this work identifies, for each potential dataset, the set of non-dominated algorithms (SNDA) based on the obtained performance metrics. Having obtained the SNDA, a CI-based MCDA procedure is followed to assign scores and ranks to the various algorithms. The main steps followed by the CI-based procedure are shown in Figure 3. First, a set of non-dominated algorithms was used to construct a performance matrix (SNDA matrix).

The rows of the matrix represent the alternatives (i.e., algorithms), whereas the columns indicate the performance of these algorithms under each performance metric or indicator. Second, performance metrics are assigned weights that can be considered trade-off coefficients, meaning that they represent the decrease in indicator (x) that can be compensated for by another indicator (y). Third, the performance matrix was normalized using one of the eight possible normalization techniques. The normalization techniques include the rank, percentile rank, standardized, min-max, logistic, categorical (−1,0,1), categorical (0.1,0.2,0.4,0.6,0.8,1), and target methods [29]. The outcome of the CI-based procedure can ultimately weigh or rank the SNDA and identify which algorithms are best suited for the classification problem in question. The normalization step helps in putting all performance indicators on the same scale and assigning a score to each alternative with respect to its counterparts. Fourth, the scores assigned after the normalized process are combined using a particular aggregation function to generate an index for each alternative (i.e., algorithm). These aggregation functions include additive, geometric, harmonic, minimum, and median functions [29]. Fifth, indices assigned to different algorithms are used to assign ranks to these algorithms. The ranks are in the range of 1 (best rank) to n (worst rank), where n is the number of elements in the SNDA. In this work, the steps outlined in the flowchart are performed for each possible combination of normalization and aggregation methods. Hence, an algorithm may be assigned different ranks under different combinations. To combine the results obtained under different combinations of normalization and aggregation, a rank frequency metric (rfm) is defined as shown in Eq. 6.


where rfmAi is the rank frequency metric of algorithm A, which shows the proportion of indices that rank algorithm A in the ith position, nranki is the number of times the algorithm A is chosen for the ith rank, and ncombinations is the number of possible combinations of normalization and aggregation under which algorithm A is ranked.

4. Results and Analysis

4.1 Edge Dataset Results

Table 5 summarizes the performance of the considered online data stream classification algorithms on a stream generated based on the Edge dataset, in terms of accuracy, precision, recall, F-score, kappa and the model size. The generated stream consisted of 25,000 samples with 17,333 and 7,667 success and failure instances, respectively. All algorithms were tested using their default set of parameters defined in the scikit-multiflow evaluation platform [33]. In addition, the ensemble-based algorithms were further tested on different sets of base estimators. For instance, the default base estimator for the LB algorithm is the kNN classifier.

However, it was also tested in other configurations, where its base estimator is set to either HT (LB+HT), HAT (LB+HAT), ARF (LB+ARF), EFDT (LB+EFDT), or VFDRC (LB+VFDRC). Similarly, the previous statement applies to all other ensemble-based algorithms. As shown, the considered algorithms exhibited different performance values under the metrics used. In addition, no single algorithm surpasses all other algorithms in terms of all the performance metrics. In this regard, the LB + ARF ensemble algorithm achieved the highest accuracy, precision, F-score, and kappa values, while the NB algorithm outperformed other algorithms in terms of recall and model size metrics. Figure 4 shows a box plot of the obtained performance values for each metric. It visualizes the degree of variation between the algorithms for each performance metric.

In general, the algorithms vary widely in terms of their kappa statistics. In addition, the considered algorithms achieved better precision, recall, and F-score compared with their accuracy values. Apparently, the class imbalance observed in the Edge dataset (i.e., stream) has a direct consequence on the performance of the classification algorithms; only those algorithms capable of handling class imbalance and concept drift achieved relatively higher performance compared with other algorithms. Specifically, the ARF algorithm and other ensemble algorithms whose base estimator is set to either HT, HAT, ARF, EFDT, or VFDRC have relatively better performance metrics compared with all other algorithms.

In Figure 4, the model size score (MSS) is computed according to Eq. 7.


where MSSi is the score assigned to algorithm i, MSi is the model size under the ith algorithm, and MSmin and MSmax are the minimum and maximum model sizes, respectively. The formula given in Eq. 7 normalizes the model size and converts it into an ascending attribute that favors algorithms with lower model sizes. It also puts the model size on the same scale as the other performance metrics. As shown in Figure 4, the vast majority of classification algorithms achieved higher model size scores (i.e., low model sizes as compared with other algorithms). Nevertheless, there are few algorithms that have significantly higher model sizes (i.e., low model size scores).

Because there is no single algorithm that is superior in all performance metrics, but rather a set of algorithms in which some of them are superior in some metrics and inferior in other metrics, it is important to identify the SNDA. Table 6 shows the SNDA for the Edge dataset classification, along with their performance metrics.

As shown in Table 6, this set consists of 25 algorithms with various trade-offs between the performance metrics considered. To determine the most viable options (i.e., algorithms) for the edge data stream, the information shown in Table 6 was fed to the CI-based MCDA process to obtain the rank frequency metric (rfm) for each algorithm in the SNDA of the edge data stream. Figure 5 shows the rfm values associated with various algorithms from the SNDA of the edge data stream. The results are shown for those algorithms that have been ranked at least once in the first three ranks (i.e., algorithms with non-zero values for their rfmA−1, rfmA−2, or rfmA−3). As depicted in Figure 5, the LB+ARF, LB+EFDT, and LB+HAT algorithms have higher values for rfmA−1, rfmA−2, and rfmA−3 than do their counterparts. In other words, they always compete for the first three ranks among the various combinations of normalization and aggregation methods. Apparently, the LB+ARF algorithm occupied the first rank for the majority of combinations and could be considered the most promising choice for the edge data stream classification problem. Conversely, the LB+HT, LB+VFDRC, DWMC+EFDT, OB-ADWIN+HAT, and OB-HAT algorithms occupied ranks that were equal to or worse than the fourth rank for a large portion of normalization and aggregation combinations.

4.2 Cloud via RSU Dataset Results

This section presents the performance metrics of the various data stream classification algorithms on a stream generated based on the CloudRSU dataset. The generated stream consists of 25,000 samples with 17,961 and 7,039 success and failure instances, respectively. Table 7 shows the accuracy, precision, recall, F-score, kappa, and model size metrics for the different algorithms. The algorithms obtained different performance values under the considered metrics. Similar to the edge data stream, there is no single algorithm that surpasses other algorithms in terms of all performance metrics. In this regard, the LB+ARF algorithm achieved the highest accuracy, F-score, and kappa values, while the LB-EFDT achieved the highest precision value, while the UB-O and NB algorithms achieved the highest recall and model size values, respectively. Figure 6 shows a box plot of the obtained performance metrics for the various algorithms. The vast majority of algorithms have comparable performance metrics, with the kappa statistic having generally lower values compared with other metrics.

Although the majority of the considered algorithms, especially the ensemble-based ones, have comparable performance metrics, there is no single algorithm that surpasses all other algorithms with respect to the considered metrics. Hence, it is necessary to analyze the dominance relationship between these algorithms and identify a SNDA. Table 8 lists the SNDA for the stream generated from the CloudRSU dataset. This set contains 22 algorithms. This set was then used to obtain the rank performance metrics, as shown in Figure 7. The results are for those algorithms that have occupied one of the first three ranks at least once under all possible combinations of normalization and aggregation methods.

The results shown in Figure 7 illustrate that the LB+EFDT, LB+HAT, and ARF algorithms dominate other algorithms in terms of the number of times they occupy one of the top three ranks. Evidently, the LB+EFDT algorithm ranks robustly within the first three ranks, with a high share of combinations assigning it to the first rank. Hence, it can be considered a suitable data stream classification algorithm for streams generated from the CloudRSU dataset.

4.3 Cloud via CN Dataset Results

This section presents the results obtained for the various algorithms on the data stream generated from the CloudCN dataset. Similar to the other two cases, the generated stream consisted of 25,000 instances. The number of success and failure instances in this stream were 16,188 and 8,812, respectively. Table 9 summarizes the performance metrics obtained using the studied algorithms. As shown, the algorithms vary in terms of their obtained performance values with no single algorithm dominating the others in all metrics. The LB+ARF, LB+EFDT, and NB algorithms achieved the highest accuracy, precision, recall, F-score, kappa, and model size, respectively.

Figure 8 shows the variability observed among the algorithms with respect to all performance metrics. As depicted, a large portion of the considered algorithms have very similar performance values. In addition, the values obtained under the kappa statistic are lower than those of the other metrics. The results of the algorithm dominance are shown in Table 10. As illustrated, the SNDA for the considered stream consists of 23 algorithms with various trade-offs between performance metrics.

Figure 9 shows the ranking results obtained by feeding the information given in Table 10 to the CI-based MCDA procedure. It considers only algorithms with non-zero values of rfmA−1, rfmA−2, or rfmA−3. The results depicted in Figure 9 show that the LB+VFDRC, LB+HT, and LB+EFDT surpass all other algorithms with respect to their rank frequency metric. While the LB+EFDT ensemble-based algorithm is assigned to the first rank for a larger number of combinations as compared to the LB+VFDRC, the latter is never assigned to a rank worse than third under all combinations of normalization and aggregation. In addition, the LB+HT and LB+EFDT algorithms occupied ranking positions worse than or equal to the fourth rank for a noticeable portion of scoring and ranking combinations. Moreover, algorithms other than the best three algorithms were allocated the fourth or worse ranks for a relatively large number of combinations, hindering their potential applicability to the considered online data stream classification task.

In summary, the results shown in Tables 6, 8, and 10 illustrate the fact that for each classification task, there is a set of non-dominated algorithms, with comparable performance metrics that can be utilized in real-life applications. The ranking results in Figures 5, 7, and 9 provide an insight into the algorithms that are superior to other members in the corresponding non-dominated set, considering the trade-offs observed under all performance metrics. Although the results presented in Sections 4.1, 4.2, and 4.3 highlight the most suitable algorithms for each classification task, it is worth noting that the LB+EFDT (i.e., leveraging bagging algorithm, whose base estimator is set to the extremely fast decision tree) demonstrates consistent competence to occupy one of the top-ranking positions for each of the considered datasets because of its competitive performance metrics, in addition to its reasonably acceptable model size. This algorithm maintains an ensemble of n EFDT base classifiers, where n is set to 10 by default in the utilized evaluation platform [33]. To classify an incoming instance, every individual classifier will produce a prediction (i.e., a vote), and the final classification result is obtained by aggregating the individual predictions. Following Condorcet’s jury theorem [5456], there is theoretical proof that the error rate of an ensemble approaches 0, in the limit, provided that two conditions are satisfied. First, individual base classifiers must perform better than random guessing. Second, individual classification models must exhibit diversity, that is, they should not produce correlated errors. In the case of the LB+EFDT algorithm, the EFDT algorithm is used as the base classification model. The EFDT is a refinement over the HT algorithm. Whereas the HT algorithm delays the selection of a split at a node until it is confident that it has determined the best split, the EFDT algorithm selects and deploys a split as soon as it is confident that the split is useful. In addition, the HT algorithm never revisits its splitting decision, while the EFDT algorithm revisits its splitting decision, replacing the split if it later becomes evident that a better split is available. Hence, the EFDT algorithm can learn rapidly from incoming instances with an inbuilt tolerance against concept drift, utilizing the most advantageous splitting decision recognized to date [41]. Consequently, the EFDT algorithm is considered a viable base classifier that meets the first condition implied by Condorcet’s jury theorem.

The LB algorithm employs online bagging to train its base models (i.e., classifiers). In this regard, as each incoming classification instance is observed, online re-sampling is performed by presenting that instance to each model k ~ Poisson (λ) times and updating each model accordingly. The value of k is regarded as the weight of the incoming instance. To increase online re-sampling, the LB algorithm uses a higher value of the λ parameter of the Poisson distribution, which is set to six by default. With this value of λ, the LB algorithm causes more randomness in the weights of the incoming instances. Hence, it achieves a higher input space diversity by assigning a different range of weights to each incoming instance. In addition, the LB algorithm leverages the bagging performance by adding randomization at the output of the ensemble using output codes. As shown in Section 2.5, the LB algorithm assigns to each possible class label a binary string of length n, where n is the number of base classifiers in the ensemble. Each base classifier learns one bit in a binary string. Unlike standard ensemble methods, the LB algorithm utilizes random output codes instead of deterministic ones. In other words, while the base classifiers in the standard methods predict the same function, using output codes allows each classifier in the LB ensemble to predict a different function [44]. This would minimize the influence of correlations between the base classifiers and, in turn, increase the diversity of the ensemble [57,58]. Hence, the introduction of randomization to the input and output of the ensemble classifiers yields, to some extent, a diversified ensemble satisfying the second condition imposed by Condorcet’s jury theorem. Furthermore, the LB algorithm uses the ADWIN algorithm to deal with concept drift, with one ADWIN instance for each classifier in the ensemble. Each time a concept drift is detected, the worst classifier is reset. Hence, the LB algorithm always monitors the quality of its online learning process and adheres to the current class distribution of incoming classification instances. In general, the inherent diversity among its base classifiers compensates for the classification errors induced by any individual classifier. This can be clearly observed in Table 5, where a single EFDT classifier achieved an accuracy of 0.7936, and the accuracy of the LB+EFDT ensemble reached 0.9326. Altogether, the LB+EFDT algorithm demonstrated the ability to maintain a noticeable performance under all considered datasets and performance metrics. Hence, it could be considered a feasible choice for online data stream classification in real-life VEC environments.

5. Conclusion

VEC has recently become an integral part of intelligent transportation systems. In these systems, workload orchestration plays a crucial role in selecting the most appropriate computational node to execute tasks generated from vehicular applications. The continuous streams of tasks generated from vehicular applications pose significant challenges in data management, execution, and storage. Online ML algorithms can address continuous data streams and manage large data sizes. Hence, this paper has addressed the potential application of online data stream classification algorithms to support workload orchestration in VEC environments. These algorithms can be used to predict the ability of the available computational nodes to successfully handle a particular task. In this work, various online data stream classification algorithms were evaluated based on synthetic datasets generated from simulated VEC environments. In addition, a MCDA technique was employed to rank various algorithms based on the obtained performance metrics. The evaluation results show that the considered algorithms can handle online classification operations with noticeable trade-offs and dominance relations with respect to their obtained performance. In addition, the employed multi-criteria decision analysis technique demonstrated the ability to rank different algorithms and identify the most appropriate algorithms to handle online classification in VEC environments.

Future work will consider implementing a workload orchestration algorithm based on the findings of this study to show how online data stream classification can enhance orchestration performance in dynamically changing VEC environments.

Conflict of Interest

No potential conflict of interest relevant to this article was reported.

Fig. 1.

System model.

Fig. 2.

Evaluation flowchart.

Fig. 3.

CI MCDA flowchart.

Fig. 4.

Box plot of performance metrics on the Edge dataset.

Fig. 5.

Algorithm ranking for the Edge dataset.

Fig. 6.

Box plot of performance metrics on the CloudRSU dataset.

Fig. 7.

Algorithms ranking for the CloudRSU dataset.

Fig. 8.

Box plot of performance metrics on the CloudCN dataset.

Fig. 9.

Algorithm ranking for the CloudCN dataset.


In-vehicle application characteristics

App-1 App-2 App-3
Usage percent 30 35 35
Task input file size (kB) 20 40 20
Task output file size (kB) 20 20 80
Task generation rate (per second) 0.3 0.2 0.07
Number of instructions per task (× 109) 3 10 20
VM utilization on RSU (%) 6 20 40
VM Utilization on Cloud (%) 1.6 4 8

Hypothetical orchestration probabilities

VEC server Cloud via RSU Cloud via CN
App-1 0.60 0.23 0.17
App-2 0.30 0.53 0.17
App-3 0.23 0.60 0.17

VEC environment state variables per offloading decision

Variable Decision
Number of instructions per task All
Number of recently uploaded tasks All
Input file size (kB) All
Output file size (kB) All
Edge server utilization (%) VEC server
WLAN upload delay (ms) VEC server
WLAN download delay (ms) VEC server
WAN upload delay (ms) Cloud via RSU
WAN download delay (ms) Cloud via RSU
CN upload delay (ms) Cloud via CN
CN download delay (ms) Cloud via CN

Sample dataset entries

Edge dataset CloudRSU dataset CloudCN dataset
Success Failure Success Failure Success Failure
Number of instructions per task 3,057 3,178 9,531 19,242 3,196 9,598
Number of previously offloaded tasks 2 129 - - - -
Number of offloaded tasks - - 1 184 1 92
Input file size 19 21 21 88 19 21
Output file size 18 21 42 22 20 36
Edge server utilization 25 20.9 - - -
WLAN upload delay 23.02 84.45 - - -
WLAN download delay 36.20 28.44 - - -
WAN upload delay - - 6.18 16.30 - -
WAN download delay - - 12.07 55.65 - -
CN upload delay - - - - 21.65 8
CN download delay - - - - 68.6 20.2

Performance comparison on the Edge dataset

Algorithm Accuracy Precision Recall F-score Kappa Model size (kB)
NB 0.6748 0.6763 0.9857 0.8022 0.0419 12
KNN 0.6875 0.7486 0.8022 0.7745 0.2675 502
kNNADWIN 0.6875 0.7487 0.8021 0.7745 0.2675 513
SAM-KNN 0.7468 0.7788 0.8681 0.8210 0.3927 981137
VFDRC 0.7199 0.7697 0.8296 0.7985 0.3413 81
HT 0.7393 0.7655 0.8798 0.8187 0.3627 145
HAT 0.7964 0.8252 0.8826 0.8529 0.5233 102
EFDT 0.7936 0.8371 0.8511 0.8440 0.5385 361
ARF 0.8991 0.9169 0.9333 0.9250 0.7707 1502
DWMC 0.7794 0.8149 0.8671 0.8402 0.4849 60
LB 0.7241 0.8047 0.762 0.7828 0.4054 4857
SMOTE-B 0.6727 0.8035 0.6604 0.7250 0.3302 60337
UO-B 0.7109 0.7123 0.9374 0.8095 0.2542 3453
OB 0.6872 0.7449 0.7946 0.7689 0.2864 4446
OB-ADWIN 0.6961 0.7542 0.7954 0.7743 0.3106 4440
LB+HT 0.9173 0.9302 0.9519 0.9409 0.8031 1069
LB+HAT 0.9304 0.9415 0.959 0.9502 0.8349 1037
LB+ARF 0.9380 0.9552 0.9552 0.9552 0.8595 24580
LB+EFDT 0.9326 0.9469 0.9562 0.9515 0.8409 2063
LB+VFDRC 0.9161 0.9305 0.9494 0.9399 0.8005 756
DWMC+HT 0.8412 0.871 0.9046 0.8875 0.6183 163
DWMC+HAT 0.853 0.884 0.9066 0.8952 0.6495 288
DWMC+ARF 0.9201 0.9393 0.9457 0.9425 0.8117 7581
DWMC+EFDT 0.87 0.8903 0.9264 0.9080 0.6871 444
DWMC+VFDRC 0.8309 0.8789 0.8765 0.8777 0.604 114
SMOTE-B + HT 0.7981 0.8791 0.8212 0.8492 0.5449 56153
SMOTE-B + HAT 0.8249 0.8951 0.8462 0.8700 0.6027 56311
SMOTE-B + ARF 0.8816 0.9247 0.9024 0.9134 0.7262 80722
SMOTE-B + EFDT 0.8565 0.9081 0.882 0.8949 0.6693 56513
SMOTE-B + FDRC 0.7893 0.8609 0.8295 0.8449 0.5166 56147
UO-B + HT 0.7872 0.7727 0.9787 0.8636 0.4018 371
UO-B + HAT 0.8231 0.8876 0.8524 0.8696 0.5949 56278
UO-B + ARF 0.88 0.9234 0.9014 0.9123 0.7227 81812
UO-B + EFDT 0.8118 0.7959 0.9792 0.8781 0.4844 926
UO-B + VFDRC 0.7973 0.7909 0.9612 0.8678 0.4505 418
OB + HT 0.7874 0.8116 0.9022 0.8545 0.4638 1661
OB + HAT 0.8448 0.8579 0.9297 0.8924 0.6158 746
OB + ARF 0.8406 0.8907 0.8774 0.8840 0.6295 11315
OB + EFDT 0.8151 0.8427 0.9011 0.8709 0.5467 3267
OB + VFDRC 0.779 0.7964 0.9145 0.8514 0.4286 872
OB-ADWIN+HT 0.83 0.851 0.9144 0.8816 0.5816 693
OB-ADWIN+HAT 0.8542 0.8676 0.9315 0.8984 0.6415 567
OB-ADWIN+ARF 0.902 0.9198 0.9404 0.9300 0.7667 15946
OB-ADWIN+EFDT 0.8604 0.8793 0.9254 0.9018 0.6615 816
OB-ADWIN+VFDRC 0.8221 0.8417 0.9151 0.8769 0.5585 556

Non-dominated set of algorithms for the Edge dataset

Algorithm Accuracy Kappa Precision Recall F-score Model size score
NB 0.6748 0.0419 0.6763 0.9857 0.8022 1
kNN 0.6875 0.2675 0.7486 0.8022 0.7745 0.9995
kNN-ADWIN 0.6875 0.2675 0.7487 0.8021 0.7745 0.9995
SAMkNN 0.7468 0.3927 0.7788 0.8681 0.821 0
VFDRC 0.7199 0.3413 0.7697 0.8296 0.7985 0.9999
HT 0.7393 0.3627 0.7655 0.8798 0.8187 0.9999
HAT 0.7964 0.5233 0.8252 0.8826 0.8529 0.9999
EFDT 0.7936 0.5385 0.8371 0.8511 0.844 0.9996
ARF 0.8991 0.7707 0.9169 0.9333 0.925 0.9985
DWMC 0.7794 0.4849 0.8149 0.8671 0.8402 1
UO-B 0.7109 0.2542 0.7123 0.9374 0.8095 0.9965
LB+HT 0.9173 0.8031 0.9302 0.9519 0.9409 0.9989
LB+HAT 0.9304 0.8349 0.9415 0.959 0.9502 0.999
LB+ARF 0.938 0.8595 0.9552 0.9552 0.9552 0.975
LB+EFDT 0.9326 0.8409 0.9469 0.9562 0.9515 0.9979
LB+VFDRC 0.9161 0.8005 0.9305 0.9494 0.9399 0.9992
DWMC+HT 0.8412 0.6183 0.871 0.9046 0.8875 0.9998
DWMC+HAT 0.853 0.6495 0.884 0.9066 0.8952 0.9997
DWMC+EFDT 0.87 0.6871 0.8903 0.9264 0.908 0.9996
DWMC+VFDRC 0.8309 0.604 0.8789 0.8765 0.8777 0.9999
UB-O+HT 0.7872 0.4018 0.7727 0.9787 0.8636 0.9996
UB-O+EFDT 0.8118 0.4844 0.7959 0.9792 0.8781 0.9991
UB-O+VFDRC 0.7973 0.4505 0.7909 0.9612 0.8678 0.9996
OB+HAT 0.8448 0.6158 0.8579 0.9297 0.8924 0.9993
OB-ADWIN + HAT 0.8542 0.6415 0.8676 0.9315 0.8984 0.9994

Performance comparison on the CloudRSU dataset

Algorithm Accuracy Precision Recall F-score Kappa Model size (kB)
NB 0.9095 0.9334 0.9409 0.9371 0.7754 11
KNN 0.6980 0.7522 0.8634 0.8040 0.1599 463
kNN-ADWIN 0.6952 0.7530 0.8557 0.8011 0.1604 474
SAM-KNN 0.7525 0.7781 0.9162 0.8415 0.2934 980406
VFDRC 0.9087 0.9475 0.9239 0.9356 0.7791 54
HT 0.9267 0.9456 0.9525 0.9490 0.8182 152
HAT 0.9238 0.9347 0.9609 0.9476 0.8080 72
EFDT 0.9298 0.9575 0.9441 0.9508 0.8288 143
ARF 0.9522 0.9631 0.9705 0.9668 0.8814 1572
DWMC 0.9253 0.9418 0.9548 0.9483 0.8137 52
LB 0.7205 0.8062 0.8035 0.8048 0.3127 4467
SMOTE-B 0.6543 0.7811 0.7189 0.7487 0.1961 55935
UO-B 0.7264 0.7325 0.9745 0.8363 0.0958 3129
OB 0.6952 0.7505 0.8616 0.8022 0.1519 4456
OB-ADWIN 0.7145 0.7658 0.8671 0.8133 0.2164 3844
LB+HT 0.9467 0.9611 0.9660 0.9635 0.8702 1179
LB+HAT 0.9540 0.9690 0.9669 0.9679 0.8868 430
LB+ARF 0.9614 0.9726 0.9737 0.9731 0.9049 18477
LB+EFDT 0.9599 0.9743 0.9697 0.9720 0.9015 1467
LB+VFDRC 0.9488 0.9650 0.9636 0.9643 0.8739 448
DWMC+HT 0.9349 0.9498 0.9610 0.9554 0.8378 110
DWMC+HAT 0.9380 0.9498 0.9646 0.9571 0.8453 234
DWMC+ARF 0.9561 0.9674 0.9716 0.9695 0.8915 5115
DWMC+EFDT 0.9416 0.9552 0.9638 0.9595 0.8549 167
DWMC+VFDRC 0.9186 0.9548 0.9305 0.9425 0.8031 135
SMOTE-B+HT 0.9282 0.9458 0.9546 0.9502 0.8218 52406
SMOTE-B+HAT 0.9251 0.9433 0.9528 0.9480 0.8139 52585
SMOTE-B+ARF 0.9517 0.9665 0.9662 0.9663 0.8810 69819
SMOTE-B+EFDT 0.9392 0.9539 0.9618 0.9578 0.8492 52699
SMOTE-B+VFDRC 0.9151 0.9510 0.9295 0.9401 0.7941 52373
UO-B + HT 0.9269 0.9325 0.9681 0.9500 0.8142 388
UO-B + HAT 0.9288 0.9349 0.9680 0.9512 0.8194 554
UO-B + ARF 0.9446 0.9498 0.9742 0.9618 0.8607 15765
UO-B + EFDT 0.9320 0.9392 0.9679 0.9533 0.8282 517
UO-B + VFDRC 0.9257 0.9378 0.9600 0.9488 0.8133 322
OB + HT 0.9262 0.9348 0.9601 0.9473 0.8184 1322
OB + HAT 0.9252 0.9362 0.9612 0.9485 0.8118 647
OB + ARF 0.9613 0.9613 0.9700 0.9656 0.8770 17768
OB + EFDT 0.9416 0.9568 0.9619 0.9593 0.8553 1514
OB + VFDRC 0.9182 0.9478 0.9376 0.9427 0.7998 827
OB-ADWIN+HT 0.9247 0.9366 0.9600 0.9482 0.8107 399
OB-ADWIN+HAT 0.9246 0.9355 0.9611 0.9481 0.8100 648
OB-ADWIN+ARF 0.9516 0.9633 0.9695 0.9664 0.8802 19609
OB-ADWIN+EFDT 0.9374 0.9515 0.9618 0.9566 0.8444 645
OB-ADWIN+VFDRC 0.9241 0.9471 0.9471 0.9471 0.8128 312

Non-dominated set of algorithms for the CloudRSU dataset

Algorithm Accuracy Kappa Precision Recall F-score Model size score
NB 0.9095 0.7754 0.9334 0.9409 0.9371 1
VFDRC 0.9087 0.7791 0.9475 0.9239 0.9356 0.999956283
HT 0.9267 0.8182 0.9456 0.9525 0.9490 0.999855844
HAT 0.9238 0.808 0.9347 0.9609 0.9476 0.999937923
EFDT 0.9298 0.8288 0.9575 0.9441 0.9508 0.999865371
ARF 0.9522 0.8814 0.9631 0.9705 0.9668 0.998408233
DWMC 0.9253 0.8137 0.9418 0.9548 0.9483 0.999957844
UO-B 0.7264 0.0958 0.7325 0.9745 0.8363 0.996819955
LB + HT 0.9467 0.8702 0.9611 0.966 0.9635 0.998808409
LB + HAT 0.954 0.8868 0.969 0.9669 0.9679 0.999572642
LB + ARF 0.9614 0.9049 0.9726 0.9737 0.9731 0.981164191
LB + EFDT 0.9599 0.9015 0.9743 0.9697 0.9720 0.998514415
DMCT + HT 0.9349 0.8378 0.9498 0.961 0.9554 0.999899143
DWMC + HAT 0.938 0.8453 0.9498 0.9646 0.9571 0.999772316
DWMC + ARF 0.9561 0.8915 0.9674 0.9716 0.9695 0.994794179
DWMC + EFDT 0.9416 0.8549 0.9552 0.9638 0.9595 0.999840483
DWMC + VFDRC 0.9186 0.8031 0.9548 0.9305 0.9425 0.999873969
UO-B + HT 0.9269 0.8142 0.9325 0.9681 0.9500 0.999615492
UB-O + HAT 0.9288 0.8194 0.9349 0.968 0.9512 0.999446182
UB-O + ARF 0.9446 0.8607 0.9498 0.9742 0.9618 0.983930994
UB-O + EFDT 0.932 0.8282 0.9392 0.9679 0.9533 0.999483728
OB + ARF 0.9613 0.877 0.9613 0.97 0.9656 0.981887726

Performance comparison on the CloudCN dataset

Algorithm Accuracy Precision Recall F-score Kappa Model size (kB)
NB 0.8969 0.9156 0.9258 0.9207 0.7736 11
KNN 0.7236 0.7415 0.8784 0.8042 0.3456 445
kNN-ADWIN 0.7214 0.7408 0.8749 0.8023 0.3415 452
SAMKNN 0.7084 0.7349 0.8585 0.7919 0.3149 979425
VFDRC 0.8961 0.9310 0.9063 0.9185 0.7752 102
HT 0.9194 0.9176 0.9616 0.9391 0.8202 115
HAT 0.9224 0.9254 0.9570 0.9409 0.8279 86
EFDT 0.9203 0.9270 0.9515 0.9391 0.8238 156
ARF 0.9420 0.9381 0.9746 0.9560 0.8712 880
DWMC 0.9017 0.9111 0.9396 0.9251 0.7823 52
LB 0.7205 0.8062 0.8035 0.8048 0.3127 4467
SMOTE-B 0.6996 0.7761 0.7519 0.7638 0.3515 50811
UO-B 0.7051 0.6931 0.9755 0.8104 0.2246 4045
OB 0.7252 0.7434 0.8775 0.8049 0.3507 4465
OB-ADWIN 0.7306 0.7486 0.8777 0.8080 0.3656 3921
LB+HT 0.9462 0.9434 0.9753 0.9591 0.8807 711
LB+HAT 0.9455 0.9444 0.9730 0.9585 0.8793 704
LB+ARF 0.9581 0.9586 0.9774 0.9679 0.9077 11654
LB+EFDT 0.9581 0.9609 0.9749 0.9678 0.9078 2081
LB+VFDRC 0.9472 0.9462 0.9737 0.9598 0.8832 500
DWMC+HT 0.9288 0.9270 0.9659 0.9461 0.8416 198
DWMC+HAT 0.9330 0.9323 0.9666 0.9491 0.8513 360
DWMC+ARF 0.9449 0.9388 0.9786 0.9583 0.8775 4565
DWMC+EFDT 0.9323 0.9328 0.9648 0.9485 0.8499 417
DWMC+VFDRC 0.9232 0.9362 0.9455 0.9408 0.8313 161
SMOTE-B + HT 0.9263 0.9270 0.9617 0.9440 0.8363 47269
SMOTE-B +HAT 0.9336 0.9341 0.9653 0.9494 0.8528 47389
SMOTE-B +ARF 0.9402 0.9286 0.9709 0.9493 0.8673 62317
SMOTE-B +EFDT 0.9336 0.9324 0.9674 0.9496 0.8525 48022
SMOTE-B +FDRC 0.9273 0.9296 0.9603 0.9447 0.8389 47313
UO-B + HT 0.9269 0.9113 0.9824 0.9455 0.8348 281
UO-B + HAT 0.9308 0.9158 0.9833 0.9484 0.8439 656
UO-B + ARF 0.9306 0.9122 0.9877 0.9484 0.8429 16995
UO-B + EFDT 0.9300 0.9157 0.9821 0.9477 0.8422 668
UO-B + VFDRC 0.9301 0.9281 0.9667 0.9470 0.8445 346
OB + HT 0.9237 0.9153 0.9718 0.9427 0.8287 1138
OB + HAT 0.9329 0.9303 0.9688 0.9492 0.8508 753
OB + ARF 0.9375 0.9319 0.9745 0.9527 0.8606 14264
OB + FDT 0.9330 0.9301 0.9692 0.9492 0.8509 1980
OB + VFDRC 0.9179 0.9305 0.9433 0.9369 0.8194 823
OB-ADWIN+HT 0.9326 0.9267 0.9726 0.9491 0.8495 351
OB-ADWIN+HAT 0.9344 0.9309 0.9705 0.9503 0.8540 703
OB-ADWIN+ARF 0.9383 0.9331 0.9743 0.9533 0.8625 12298
OB-ADWIN+EFDT 0.9345 0.9321 0.9693 0.9503 0.8545 689
OB-ADWIN+VFDRC 0.9291 0.9324 0.9598 0.9459 0.8431 452

Non-dominated set of algorithms for the CloudCN dataset

Algorithm Accuracy Kappa Precision Recall F-score Model size score
NB 0.8969 0.7736 0.9156 0.9258 0.9207 1
VFDRC 0.8961 0.7752 0.931 0.9063 0.9185 0.999906689
HT 0.9194 0.8202 0.9176 0.9616 0.9391 0.999893732
HAT 0.9224 0.8279 0.9254 0.957 0.9409 0.999923127
EFDT 0.9203 0.8238 0.927 0.9515 0.9391 0.999851544
ARF 0.942 0.8712 0.9381 0.9746 0.9560 0.999113041
DWMC 0.9017 0.7823 0.9111 0.9396 0.9251 0.999957801
UB-O 0.7051 0.2246 0.6931 0.9755 0.8104 0.99588126
LB+HT 0.9462 0.8807 0.9434 0.9753 0.9591 0.999284837
LB+HAT 0.9455 0.8793 0.9444 0.973 0.9585 0.999292781
LB+ARF 0.9581 0.9077 0.9586 0.9774 0.9679 0.988112142
LB+EFDT 0.9581 0.9078 0.9609 0.9749 0.9678 0.997886041
LB+VFDRC 0.9472 0.8832 0.9462 0.9737 0.9598 0.999501079
DWMC+HT 0.9288 0.8416 0.927 0.9659 0.9461 0.999809508
DWMC+HAT 0.933 0.8513 0.9323 0.9666 0.9491 0.999644093
DWMC+ARF 0.9449 0.8775 0.9388 0.9786 0.9583 0.995349983
DWMC+EFDT 0.9323 0.8499 0.9328 0.9648 0.9485 0.999585037
DWMC+VFDRC 0.9232 0.8313 0.9362 0.9455 0.9408 0.999847256
UB-O+HT 0.9269 0.8348 0.9113 0.9824 0.9455 0.999723845
UB-O+HAT 0.9308 0.8439 0.9158 0.9833 0.9484 0.999341514
UB-O+ARF 0.9306 0.8429 0.9122 0.9877 0.9484 0.982658624
UB-O+VFDRC 0.9301 0.8445 0.9281 0.9667 0.9470 0.999657887
OB-ADWIN+HT 0.9326 0.8495 0.9267 0.9726 0.9490 0.999652466

  1. Guo, H, Liu, J, Ren, J, and Zhang, Y (2020). Intelligent task offloading in vehicular edge computing networks. IEEE Wireless Communications. 27, 126-132.
  2. Yurtsever, E, Lambert, J, Carballo, A, and Takeda, K (2020). A survey of autonomous driving: common practices and emerging technologies. IEEE Access. 8, 58443-58469.
  3. Neil, J, Cosart, L, and Zampetti, G (2020). Precise timing for vehicle navigation in the smart city: an overview. IEEE Communications Magazine. 58, 54-59.
  4. Sabella, D, Brevi, D, Bonetto, E, Ranjan, A, Manzalini, A, and Salerno, D. MEC-based infotainment services for smart roads in 5G environments, 1-6.
  5. Atallah, RF, Assi, CM, and Khabbaz, MJ (2019). Scheduling the operation of a connected vehicular network using deep reinforcement learning. IEEE Transactions on Intelligent Transportation Systems. 20, 1669-1682.
  6. Sun, Y, Guo, X, Zhou, S, Jiang, Z, Liu, X, and Niu, Z . Learning-based task offloading for vehicular cloud computing systems., Proceedings of 2018 IEEE International Conference on Communications (ICC), 2018, Kansas City, MO, Array, pp.1-7.
  7. Liu, L, Chen, C, Pei, Q, Maharjan, S, and Zhang, Y (2020). Vehicular edge computing and networking: a survey. Mobile Networks and Applications.
  8. Zhang, K, Mao, Y, Leng, S, Maharjan, S, and Zhang, Y . Optimal delay constrained offloading for vehicular edge computing networks., Proceedings of 2017 IEEE International Conference on Communications (ICC), 2017, Paris, France, Array, pp.1-6.
  9. Liu, S, Liu, L, Tang, J, Yu, B, Wang, Y, and Shi, W (2019). Edge computing for autonomous driving: opportunities and challenges. Proceedings of the IEEE. 107, 1697-1716.
  10. Payalan, YF, and Guvensan, MA (2020). Towards nextgeneration vehicles featuring the vehicle intelligence. IEEE Transactions on Intelligent Transportation Systems. 21, 30-47.
  11. Sonmez, C, Ozgovde, A, and Ersoy, C (2019). Fuzzy workload orchestration for edge computing. IEEE Transactions on Network and Service Management. 16, 769-782.
  12. Wang, Y, Wang, S, Zhang, S, and Cen, H (2019). An edge-assisted data distribution method for vehicular network services. IEEE Access. 7, 147713-147720.
  13. Sonmez, C, Tunca, C, Ozgovde, A, and Ersoy, C (2021). Machine learning-based workload orchestrator for vehicular edge computing. IEEE Transactions on Intelligent Transportation Systems. 22, 2239-2251.
  14. Rojas, JS, Pekar, A, Rendon, A, and Corrales, JC (2020). Smart user consumption profiling: Incremental learning-based OTT service degradation. IEEE Access. 8, 207426-207442.
  15. Nallaperuma, D, Nawaratne, R, Bandaragoda, T, Adikari, A, Nguyen, S, Kempitiya, T, De Silva, D, Alahakoon, D, and Pothuhera, D (2019). Online incremental machine learning platform for big data-driven smart traffic management. IEEE Transactions on Intelligent Transportation Systems. 20, 4679-4690.
  16. Adhikari, U, Morris, TH, and Pan, S (2018). Applying Hoeffding adaptive trees for real-time cyber-power event and intrusion classification. IEEE Transactions on Smart Grid. 9, 4049-4060.
  17. Li, X, Zhou, Y, Jin, Z, Yu, P, and Zhou, S (2020). A classification and novel class detection algorithm for concept drift data stream based on the cohesiveness and separation index of Mahalanobis distance. Journal of Electrical and Computer Engineering. 2020. article no 4027423
  18. Rutkowski, L, Jaworski, M, and Duda, P (2020). Basic concepts of data stream mining. Stream Data Mining: Algorithms and Their Probabilistic Properties. Cham, Switzerland: Springer, pp. 13-33
  19. Gepperth, A, and Hammer, B . Incremental learning algorithms and applications., Proceedings of the 24th European Symposium on Artificial Neural Networks (ESANN), 2016, Bruges, Belgium.
  20. Yang, Q, Gu, Y, and Wu, D . Survey of incremental learning., Proceedings of 2019 Chinese Control And Decision Conference (CCDC), 2019, Nanchang, China, Array, pp.399-404.
  21. Wankhade, KK, Dongre, SS, and Jondhale, KC (2020). Data stream classification: a review. Iran Journal of Computer Science. 3, 239-260.
  22. El Mrabet, Z, Selvaraj, DF, and Ranganathan, P . Adaptive Hoeffding tree with transfer learning for streaming synchrophasor data sets., Proceedings of 2019 IEEE International Conference on Big Data (Big Data), 2019, Los Angeles, CA, Array, pp.5697-5704.
  23. Nixon, C, Sedky, M, and Hassan, M . Practical application of machine learning based online intrusion detection to internet of things networks., Proceedings of 2019 IEEE Global Conference on Internet of Things (GCIoT), 2019, Dubai, UAE, Array, pp.1-5.
  24. Da Costa, VGT, Santana, EJ, Lopes, JF, and Barbon, S (2019). Evaluating the four-way performance trade-off for stream classification. Green, Pervasive, and Cloud Computing. Cham, Switzerland: Springer, pp. 3-17
  25. Lopes, JF, Santana, EJ, da Costa, VGT, Zarpelao, BB, and Junior, SB (2020). Evaluating the four-way performance trade-off for data stream classification in edge computing. IEEE Transactions on Network and Service Management. 17, 1013-1025.
  26. Bifet, A, Gavalda, R, Holmes, G, and Pfahringer, B (2018). Machine Learning for Data Streams: With Practical Examples in MOA. Cambridge, MA: MIT Press
  27. Greco, S, Figueira, J, and Ehrgott, M (2016). Multiple Criteria Decision Analysis. New York, NY: Springer
  28. Cinelli, M, Kadzinski, M, Gonzalez, M, and Slowinski, R (2020). How to support the application of multiple criteria decision analysis? Let us start with a comprehensive taxonomy. Omega. 96. article no 102261
    Pubmed KoreaMed CrossRef
  29. Cinelli, M, Spada, M, Kim, W, Zhang, Y, and Burgherr, P (2021). MCDA Index Tool: an interactive software to develop indices and rankings. Environment Systems and Decisions. 41, 82-109.
  30. Diaz-Balteiro, L, Gonzalez-Pachon, J, and Romero, C (2017). Measuring systems sustainability with multi-criteria methods: a critical review. European Journal of Operational Research. 258, 607-616.
  31. El Gibari, S, Gomez, T, and Ruiz, F (2019). Building composite indicators using multicriteria methods: a review. Journal of Business Economics. 89, 1-24.
  32. Greco, S, Ishizaka, A, Tasiou, M, and Torrisi, G (2019). On the methodological framework of composite indices: a review of the issues of weighting, aggregation, and robustness. Social Indicators Research. 141, 61-94.
  33. Montiel, J, Read, J, Bifet, A, and Abdessalem, T (2018). Scikit-multiflow: a multi-output streaming framework. The Journal of Machine Learning Research. 19, 2915-2914.
  34. Webb, GI, Hyde, R, Cao, H, Nguyen, HL, and Petitjean, F (2016). Characterizing concept drift. Data Mining and Knowledge Discovery. 30, 964-994.
  35. Demsar, J, and Bosnic, Z (2018). Detecting concept drift in data streams using model explanation. Expert Systems with Applications. 92, 546-559.
  36. Forster, K, Monteleone, S, Calatroni, A, Roggen, D, and Troster, G . Incremental kNN classifier exploiting correct-error teacher for activity recognition., Proceedings of 2010 9th International Conference on Machine Learning and Applications, 2010, Washington, DC, Array, pp.445-450.
  37. Bifet, A, and Gavalda, R . Learning from time-changing data with adaptive windowing., Proceedings of the 2007 SIAM International Conference on Data Mining, 2007, Minneapolis, MN, Array, pp.443-448.
  38. Losing, V, Hammer, B, and Wersing, H . KNN classifier with self adjusting memory for heterogeneous concept drift., Proceedings of 2016 IEEE 16th International Conference on Data Mining (ICDM), 2016, Barcelona, Spain, Array, pp.291-300.
  39. Hulten, G, Spencer, L, and Domingos, P . Mining time-changing data streams., Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001, San Francisco, CA, Array, pp.97-106.
  40. Bifet, A, and Gavalda, R (2009). Adaptive learning from evolving data streams. Advances in Intelligent Data Analysis. Heidelberg, Germany: Springer, pp. 249-260
  41. Manapragada, C, Webb, GI, and Salehi, M . Extremely fast decision tree., Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, London, UK, Array, pp.1953-1962.
  42. Kosina, P, and Gama, J (2015). Very fast decision rules for classification in data streams. Data Mining and Knowledge Discovery. 29, 168-202.
  43. Oza, NC . Online bagging and boosting., Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, 2001, Waikoloa, HI, Array, pp.2340-2345.
  44. Bifet, A, Holmes, G, and Pfahringer, B (2010). Leveraging bagging for evolving data streams. Machine Learning and Knowledge Discovery in Databases. Heidelberg, Germany: Springer, pp. 135-150
  45. Wang, B, and Pineau, J (2016). Online bagging and boosting for imbalanced data streams. IEEE Transactions on Knowledge and Data Engineering. 28, 3353-3366.
  46. Gomes, HM, Bifet, A, Read, J, Barddal, JP, Enembreck, F, Pfharinger, B, Holmes, G, and Abdessalem, T (2017). Adaptive random forests for evolving data stream classification. Machine Learning. 106, 1469-1495.
  47. Kolter, JZ, and Maloof, MA (2007). Dynamic weighted majority: an ensemble method for drifting concepts. he Journal of Machine Learning Research. 8, 2755-2790.
  48. Wang, S, and Yao, X . Diversity analysis on imbalanced data sets by using ensemble models., Proceedings of 2009 IEEE Symposium on Computational Intelligence and Data Mining, 2009, Nashville, TN, Array, pp.324-331.
  49. Chawla, NV, Bowyer, KW, Hall, LO, and Kegelmeyer, WP (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research. 16, 321-357.
  50. Sonmez, C, Ozgovde, A, and Ersoy, C (2018). Edgecloudsim: an environment for performance evaluation of edge computing systems. Transactions on Emerging Telecommunications Technologies. 29. article no. e3493
  51. Vasiloudis, T, Beligianni, F, and De Francisci Morales, G . BoostVHT: boosting distributed streaming decision trees., Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017, Singapore, Array, pp.899-908.
  52. Bifet, A, de Francisci Morales, G, Read, J, Holmes, G, and Pfahringer, B . Efficient online evaluation of big data stream classifiers., Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, Sydney, Australia, Array, pp.59-68.
  53. Ishibuchi, H, Masuda, H, and Nojima, Y . Selecting a small number of non-dominated solutions to be presented to the decision maker., Proceedings of 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2014, San Diego, CA, Array, pp.3816-3821.
  54. Hansen, LK, and Salamon, P (1990). Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence. 12, 993-1001.
  55. Ladha, KK (1993). Condorcet’s jury theorem in light of de Finetti’s theorem. Social Choice and Welfare. 10, 69-85.
  56. van Rijn, JN, Holmes, G, Pfahringer, B, and Vanschoren, J (2018). The online performance estimation framework: heterogeneous ensemble learning for data streams. Machine Learning. 107, 149-176.
  57. Lv, Y, Peng, S, Yuan, Y, Wang, C, Yin, P, Liu, J, and Wang, C (2019). A classifier using online bagging ensemble method for big data stream learning. Tsinghua Science and Technology. 24, 379-388.
  58. Kolarik, M, Sarnovsky, M, and Paralic, J . Diversity in ensemble model for classification of data streams with concept drift., Proceedings of 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI), 2021, Herl’any, Slovakia, Array, pp.000355-000360.

Mutaz Al-Tarawneh is an associate professor in computer engineering at Mutah University, Jordan. He obtained his Ph.D. in Computer Engineering from Southern Illinois University Carbondale, USA, in 2010. His current research interests include cloud computing, machine learning, and IoT systems.