Vessel Traffic Services operators have kept sharp monitoring and provided the appropriate information to ensure safe and effective navigation. While attending the tasks, analysis of traffic patterns and navigational data is required to conduct accurate situation assessment in decision-making process of VTS operators (VTSO). Unfortunately, there are problems in the process of data analysis such as appropriateness of time, VTSO’s personal error and improper judgment. Therefore, objective and proper data analysis is necessary to solve above matters. However, it is virtually impossible to monitor all vessels because there are many vessels in the VTS area and at the same time complex traffic situations are produced. In this study, we proposed a machine learning algorithms for objective and accurate pattern recognition and data modeling. Support Vector Regression algorithm was used for data learning and modeling. The optimal parameters were selected through
The navigational environment in harbour limits are complex and changeable due to concentrated traffic density, development of port facilities and other geographical and environmental factors. Therefore, required information is becoming increasingly diverse to support safe and effective marine traffic [1, 2]. According to the e-navigation strategy implementation, in addition, the role of shore based information is expected to gradually increase and Vessel Traffic Services (VTS) will be the part of important roles [3, 4]. VTS has established by authorized governments in necessary places in order to promote safe and efficient navigation and prevent marine accidents based on ‘IMO RESOLUTION A.857 (20) on Guidelines for Vessel Traffic Services’, Article 36 of ‘Maritime Safety Act’, Chapter 4 of ‘Act of Ship Arrival and Departure’ and ‘rules of implementation for Vessel Traffic Services’ [5–8]. In recent years, the role of VTS is expanded to adjust traffic situations in VTS areas as providing effective information to vessels based on collected and selected navigational data [9, 10].
Understanding traffic patterns is one of the most important part for situational awareness of maritime traffic in order to predict traffic situations and ship’s positions. In addition, VTS operators (VTSO) who have to monitor wide areas should keep a capability of accurate assessment within limited time [11, 12]. Therefore, it is necessary for development of decision-making support tools to relieve cognitive workload and stress of VTSOs.
In this study, we aimed to help VTSOs and ship’s masters by providing information on route usage through the data analysis and pattern recognition of navigational data. In addition, we intended to contribute to the traffic monitoring and prediction. The machine learning algorithm was used to construct a reliable reference route with the small number of data set consisting of data for the most recent sailing. Support Vector Regression (SVR) algorithm was used for pattern recognition and modeling. The optimal parameters were selected through v-fold cross validation and grid search. The machine learning was conducted with virtual route and ship tracks that are similar with real navigational environment. As a result, we presented reference route model and utilization method of this study.
For building a learning model in SVR, the most important thing is to choose the kernel function and optimal parameters. In the relevant study, Hsu et al. [13] proposed LIBSVM algorithm with cross validation and grid search. They presented following six steps in the data learning and SVM model design process.
(1) Transform data
(2) Conduct simple scaling on the data
(3) Consider the RBF kernel as a kernel function
(4) Use cross-validation to find the best parameter
(5) Use the best parameter to train the whole training set
(6) Test
In this paper, we configured modeling process with reference to parameter selection that proposed in the relevant articles. Modeling process for extracting reference route can be summarized in the following steps:
(1) Data collection
(2) Data classification
➀ Classification of target area
➁ Classification of target route
➂ Construction of sub-data sets
(3) Data learning and model extraction
➀ Conversion of input data sets
➁ Data scaling
➂ Parameter range selection
➃ Optimal parameter selection
➄ Optimal model selection
(4) Database construction
The simulation was performed with the process from (2)- ➂ to (3)- ➄ because classified virtual data sets were used for the simulation. Therefore, the classification steps were omitted in the simulation.
Meanwhile, same steps were applied for route extraction and pattern recognition of navigational data.
Configuration of data set is the structure of (M ×N) matrix for each individual ship. Because the structure is ease to transfer with each component to column vectors for learning [14–17]. First, the program selects any one of the vessels to learn entire trajectory and divides total distance to k equal parts while moving the entire trajectory to constitute the k sets of data. Each sub-data set is composed of a distance of 2k with overlapping the distance k. Therefore, the number of sub-data sets is k − 1. Each divided data sets are learned twice and models of the overlapping legs are composed of one final model after final learning process.
Due of each trajectory data obtained from individual vessel, the number of constituent elements and data range are different according to the ship speed. The data scaling is required in order to effectively learn the trajectory without being affected by the differences. When the data set
The Support Vector Machine (SVM) is a classifying technique to configure the hyper-plane to maximize the margin through supervised learning. Although the SVM originally developed for classification issue, it has been extended to problems with regression and probability density estimation [18–20].
The SVM can be applied to regression model by introducing the loss function. When the training data set (x_{1}, y_{1}), …, (x_{N}, y_{N}) ∈ R^{M} · R are given, ɛ-SVR that was proposed by Vapnik is to find f(x) with minimum ω and insensitive parameter ɛ against all training data sets [21, 22]. Where N is the number of training data, R^{M} represents the input space. The linear function is as below,
As Figure 1, the slack variable ζ_{i},
Therefore, the linear equation can be expressed the output SVR,
In SVR, the training data in the input space can be mapped in the high-dimensional space using a non-linear mapping function ϕ to approximate to a non-linear function (ϕ : R^{M} → F). This non-linear mapping function is called the kernel function [21, 22] and
Therefore, the non-linear equation can be expressed the output SVR,
In this study, we used Gaussian Radial Basis Function (RBF) as the kernel function to solve the non-linear problem. Gaussian RBF presents successful performance of the various sector and it can be expressed that
Parameter values ɛ and σ are needed to be selected in the model selection. The parameter ɛ is a value to adjust the complexity of the model by adjusting the number of support vector. Because small value ɛ takes a lot of training data as support vectors, output model becomes complex. On the other hand, larger ɛ reduces the number of support vectors and model can be simply determined [14–17]. Parameter value σ is the standard deviation of the kernel function Gaussian RBF. Lager σ favors models that close to straight lines and small value of σ takes curved lines.
The v-fold cross validation and grid search were used for avoiding the over-fitting of models presented by SVR and selecting optimal parameters of route model. First, whole data sets are needed to be divided into the training data set and the evaluation data set. At this time, the number of these sub-data sets will be v. One of the sub-data set v can be the validation set and the others (v − 1) are training sets for a validation of a divided group. These procedure will be repeated as v times. The average value of these accuracies can be an objective indicator of the performance of the whole data sets. It can be explained as
The q_{i} represents the validation error, and the number of models for the verification is determined based on the number of v and the range of parameter selection.
The virtual tracks are composed in order to conduct machine learning for ship trajectories and navigational data. The shape of passage is similar with character of alphabet ‘S’ that includes tight bends. Fifteen ships are composed as data sets and they have various characteristics while sailing on the route with different course and speed. The data sets are transformed to the learning formation and divided into eight sub-data sets. The virtual passage has following features;
(1) It has tight bends,
(2) narrow channels,
(3) designated route track,
(4) and diverse changes on course and speed.
Among the data sets, a vessel is showing anomaly behavior and the vessel break away from the designated route twice. The data sets and sub-data sets are presented on Figure 2.
Data learning on divided sub-data sets was carried out through each learning engine. Here, we presented the learning results of latitude and longitude components in the progress of the learning steps, and the results of these components is shown in Figure 3.
Meanwhile, the sub-models are extracted after data learning and the results are shown in Figure 4.
After extracting the each sub-model that has overlapped sections, the models are arranged and scaled as similar coordinates. After the process, a final model can be obtained with learning process on the whole sub-model data. The extraction steps of the sub-models and the final model are in same process and it will be repeated until finding the best model. The final route model for the virtual route is shown in Figure 5.
Navigation data are needed to be compared with the specific position of each ship. Therefore, the data are given to the approximate point of the model. The given data are differences among the data taken from a specific location of each vessel. Figures 6
As the results of comparison of each vessel’s deviation, the anomaly behavior can be detected when a ship has sudden changes on course or speed. Users can easily recognize ship’s deviation and it would be a reference information to predict the above behavior in advance. When the position of the target vessel deviated from the specified path, the difference of ship’s speed and course began to increase. In other words, when an abnormality of ship’s speed or course occurs, the target vessel leaves the designated route and a dangerous situation occurs. Therefore, if the abnormalities of ship’s speed or course are monitored together with the position of the ship, it is possible to detect in advance whether the ship has deviated or not.
It is very important to understand the traffic route of a ship in order to prevent marine accidents. It is an indispensable task in VTS to predict the ship’s traffic and analyze the navigational data to determine the ship’s abnormal behavior. In this study, we presented reference route model using SVR. The SVR algorithm was used for pattern recognition and modeling. The optimal parameters were selected through v-fold cross validation and grid search. The machine learning was conducted with virtual route and ship tracks that are similar with real navigational environment. The proposed method is able to compose the dynamic route model based on recent and a few ship’s navigational data. And it was possible to extract reference data not only the route of the ship but also the speed and the course that can be used for judging the abnormality of the ship. The proposed model can be used to prevent accidents and provide reference routes through the prediction and decision-support information to VTSOs and mariners. For further study, we need to develop the appropriate method for data division in case that the legs become long and complex. In addition, we conduct simulations with users based on real navigational data and develop various applications related with data learning.
This work was conducted as the Research for Development Strategy to Future Maritime Traffic Environments and Applications to Maritime Safety Technology which was supported by KRISO from November 1, 2016 to February 28, 2017 (Project No. 2016-0096).
No potential conflict of interest relevant to this article was reported.
Lost function.
Simulation dataset and data division.
Data learning on sub-dataset.
Extracted model of sub-dataset.
Extracted route model of whole data-set.
Deviation comparison.
Comparison of course differences.
Comparison of speed changes.
Relationship between deviation and course differences.
Relationship between deviation and speed changes.
E-mail: jskim81@korea.kr
E-mail: jsjeong@mmu.ac.kr