Article Search
닫기

Original Article

Split Viewer

International Journal of Fuzzy Logic and Intelligent Systems 2021; 21(2): 189-203

Published online June 25, 2021

https://doi.org/10.5391/IJFIS.2021.21.2.189

© The Korean Institute of Intelligent Systems

Modification of a Density-Based Spatial Clustering Algorithm for Applications with Noise for Data Reduction in Intrusion Detection Systems

Wiharto, Aditya K. Wicaksana, and Denis E. Cahyani

Department of Informatic, Universitas Sebelas Maret, Surakarta, Indonesia

Correspondence to :
Wiharto (wiharto@staff.uns.ac.id)

Received: February 9, 2021; Revised: May 12, 2021; Accepted: June 9, 2021

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Monitoring activity in computer networks is required to detect anomalous activities. This monitoring model is known as an intrusion detection system (IDS). Most IDS model developments are based on machine learning. The development of this model requires activity data in the network, either normal or anomalous, in sufficient amounts. The amount of available data also has an impact on the slow learning process in the IDS system, with the resulting performance sometimes not being proportional to the amount of data. This study proposes an IDS model that combines DBSCAN modification with the CART algorithm. DBSCAN modification is performed to reduce data by adding a MinNeighborhood parameter, which is used to determine the distance of the density to the cluster center point, which will then be marked for deletion. The test results, using the Kaggle and KDDCup99 datasets, show that the proposed system model is able to maintain a classification accuracy above 90% for 80% data reduction. This performance was also followed by a decrease in computation time, for the Kaggle dataset from 91.8 ms to 31.1 ms, while for the KDDCup99 dataset from 5.535 seconds to 1.120 seconds.

Keywords: Clustering, CART, Intrusion detection system, DBSCAN, Data reduction

We would like to thank Sebelas Maret University for providing a group research grant with contract number 260/UN27.22/HK.07.00/2021. We also express our gratitude to a number of parties who have helped to complete our research.

No potential conflict of interest relevant to this article was reported.

Wiharto is an Associate professor of Computer Science at Department of Informatics, Sebelas Maret University, Surakarta, Indonesia. He received his Ph.D. degree from Gadjah Mada University, Indonesia in 2017. He is conducting research activities in the areas of artificial intelligence, computational intelligence, expert system, network security and data mining.


Aditya K. Wicaksana received obtained a Bachelor of Science (B.S.) from Department of Informatics, Sebelas Maret University, Surakarta, Indonesia, 2020. The area of research being carried out is the network security, data mining, artificial intelligence.


Denis Eka Cahyani received obtained a Bachelor of Science (B.S.) from Department of Informatics, Sebelas Maret University, Indonesia, 2013 and master’s degree in Computer Science (M.Cs.) from Indonesia University, Indonesia, 2015. He is presently working as a lecturer in the Department of Informatics, Sebelas Maret University, Indonesia (2021) and Department of mathematics, State University of Malang, Indonesia (2021-present). His experience and areas of interest focus on natural language processing, semantic and web information retrieval.


Article

Original Article

International Journal of Fuzzy Logic and Intelligent Systems 2021; 21(2): 189-203

Published online June 25, 2021 https://doi.org/10.5391/IJFIS.2021.21.2.189

Copyright © The Korean Institute of Intelligent Systems.

Modification of a Density-Based Spatial Clustering Algorithm for Applications with Noise for Data Reduction in Intrusion Detection Systems

Wiharto, Aditya K. Wicaksana, and Denis E. Cahyani

Department of Informatic, Universitas Sebelas Maret, Surakarta, Indonesia

Correspondence to:Wiharto (wiharto@staff.uns.ac.id)

Received: February 9, 2021; Revised: May 12, 2021; Accepted: June 9, 2021

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Monitoring activity in computer networks is required to detect anomalous activities. This monitoring model is known as an intrusion detection system (IDS). Most IDS model developments are based on machine learning. The development of this model requires activity data in the network, either normal or anomalous, in sufficient amounts. The amount of available data also has an impact on the slow learning process in the IDS system, with the resulting performance sometimes not being proportional to the amount of data. This study proposes an IDS model that combines DBSCAN modification with the CART algorithm. DBSCAN modification is performed to reduce data by adding a MinNeighborhood parameter, which is used to determine the distance of the density to the cluster center point, which will then be marked for deletion. The test results, using the Kaggle and KDDCup99 datasets, show that the proposed system model is able to maintain a classification accuracy above 90% for 80% data reduction. This performance was also followed by a decrease in computation time, for the Kaggle dataset from 91.8 ms to 31.1 ms, while for the KDDCup99 dataset from 5.535 seconds to 1.120 seconds.

Keywords: Clustering, CART, Intrusion detection system, DBSCAN, Data reduction

Fig 1.

Figure 1.

Research method.

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 189-203https://doi.org/10.5391/IJFIS.2021.21.2.189

Fig 2.

Figure 2.

Effect of data reduction on IDS performance, not considering data labels.

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 189-203https://doi.org/10.5391/IJFIS.2021.21.2.189

Fig 3.

Figure 3.

Comparison of time computation with IDS performance, without considering labels.

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 189-203https://doi.org/10.5391/IJFIS.2021.21.2.189

Fig 4.

Figure 4.

Effect of data reduction on IDS performance parameters (class separation).

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 189-203https://doi.org/10.5391/IJFIS.2021.21.2.189

Fig 5.

Figure 5.

Comparison of time computation with IDS performance (class separation).

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 189-203https://doi.org/10.5391/IJFIS.2021.21.2.189

Fig 6.

Figure 6.

Effect of data reduction on IDS performance using the KDDCup99 dataset.

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 189-203https://doi.org/10.5391/IJFIS.2021.21.2.189

Fig 7.

Figure 7.

Comparison of time computation with IDS performance (KDDCup99).

The International Journal of Fuzzy Logic and Intelligent Systems 2021; 21: 189-203https://doi.org/10.5391/IJFIS.2021.21.2.189

Confusion matrixs.

Predictive class
PositiveNegative
Actual classPositiveTPFN
NegativeFPTN

Results of data reduction without considering the data label (Kaggle).

Parameters of DBSCANReduction data (%)Number of data
Not using DBSCAN025,192
MinPts=170; Eps=0.1; MinNeighborhood=0.05523,932
MinPts=40; Eps=0.5; MinNeighborhood=0.052020,154
MinPts=40; Eps=0.8; MinNeighborhood=0.53017,634
MinPts=100; Eps=1.0; MinNeighborhood=0.84912,848
MinPts=115; Eps=1.2; MinNeighborhood=0.86010,080
MinPts=125; Eps=1.5; MinNeighborhood=1805,040

Results of data reduction by considering labels (Kaggle).

Reduction data (%)NormalAnomalyTotal number of data
ParameterNParameterN
0-13,449-11,74325,192
5MinPts=200; Eps=0.05; MinNeighborhood=0.0512,818MinPts=120; Eps=0.1; MinNeighborhood=0.0511,18324,001
20MinPts=30; Eps=0.1; MinNeighborhood=0.110,812MinPts=70; Eps=0.1; MinNeighborhood=0.059,43120,243
30MinPts=10; Eps=0.1; MinNeighborhood=0.059,424MinPts=50; Eps=0.1; MinNeighborhood=0.058,35517,779
49MinPts=180; Eps=1.5; MinNeighborhood=0.16,663MinPts=10; Eps=0.1; MinNeighborhood=0.056,00512,668
60MinPts=200; Eps=1.5; MinNeighborhood=0.15,845MinPts=8; Eps=0.08; MinNeighborhood=0.054,0759,920
80MinPts=230; Eps=1.8; MinNeighborhood=0.22,851MinPts=5; Eps=0.08; MinNeighborhood=0.052,1615,012

Results of data reduction without considering the data label (KDDCup99).

Parameters of DBSCANReduction data (%)Number of data
Not using DBSCAN025,192
MinPts=10; Eps=0.1; MinNeighborhood=0.05523,932
MinPts=10; Eps=0.1; MinNeighborhood=0.32020,154
MinPts=10; Eps=0.25; MinNeighborhood=0.283017,634
MinPts=40; Eps=0.6; MinNeighborhood=0.64912,848
MinPts=115; Eps=1.2; MinNeighborhood=0.86010,080
MinPts=125; Eps=1.5; MinNeighborhood=1805,040

Comparison of accuracy with various data reduction methods.

Reduction data (%)Reduction Method
KaggleKDDCup99
m-DBSCANSamplingSlice the datam-DBSCANSamplingSlice the data
00.9940.9940.9940.9440.9440.944
50.9580.9430.9430.9500.9770.950
200.9550.9560.9410.9440.9600.950
300.9470.9550.9360.9240.9700.944
490.9360.9370.9230.9350.9470.941
600.9400.9140.8930.9420.9410.935
800.9230.9030.8650.9300.9300.930

Comparison with previous research.

StudyMethodData reduction approachDatasetSensitivitySpecificityAccuracy
Bhattacharya et al. [46]SVMPCAKaggle79.398.195.2
SVMPCA+FireflyKaggle84.499.897.5
NBPCAKaggle68.594.175.3
NBPCA+FireflyKaggle76.897.284.2

Sarker et al. [48]NB-Kaggle90.0-90.0
IntruDTreeEmbedKaggle98.0-98.0

Wang et al. [4]BPNN+Fuzzy aggregationFuzzy clusteringKDDCupp99--96.6

Khare et al. [47]DNNSMOKDDCupp9992.893.092.8
DNNPCAKDDCupp9989.888.589.8
DNN-KDDCupp9990.988.290.9

ProposedCARTm-DBSCANKaggle89.194.492.3
CARTm-DBSCANKDDCupp9993.294.393.7

Algorithm 1. m-DBSCAN(D, Eps, MinPts, MinNeighbor).

1:C = 0
2:for each unvisited point P in dataset D do
3: mark P as visited
4: N = getNeighbors (P, Eps)
5:X = getNeighbors (P, MinNeighbor)
6:if sizeof(N) < MinPts then
7:  mark P as NOISE
8:else
9:  C = next cluster
10: ExpandCluster(P, N, C, Eps, MinPts, MinNeighbor,X)

Algorithm 2. ExpandCluster(P, N, C, Eps, MinPts, Min-Neighbor,X).

1:add P to cluster C
2:for each point P′ in N do
3:if P′ is not visited then
4:  mark P′as visited
5:  N′ = getNeighbors(P′, Eps)
6:  deleteNeighbor = getNeighbors(P′, MinNeighbor)
7:  if sizeof(N′) > = MinPts then
8:   N = N joined with N′
9:   X = X joined with deleteNeighbor
10:if P′ is not yet member of any cluster then
11:  add P′ to cluster C

Share this article on :

Related articles in IJFIS

Most KeyWord