Một số bộ dữ liệu kiểm thử phổ biến cho phát hiện xâm nhập mạng và đặc tính phân cụm

Bùi Công Thành^1* , Nguyễn Quang Uy² , Hoàng Minh³

*Tác giả liên hệ: Email: congthanhttmt@gmail.com

¹Binh chủng Thông tin liên lạc

²Học viện Kỹ thuật Quân sự

³Học viện Khoa học, Công nghệ và Đổi mới sáng tạo

Ngày nhận bài: 01/01/0001; ngày chuyển phản biện: 01/01/0001; ngày nhận phản biện: 01/01/0001; ngày chấp nhận đăng: 01/01/0001

Tóm tắt:

Những năm qua, đã có rất nhiều nghiên cứu về học máy (Machine learning), học sâu (Deep learning) cho lĩnh vực phát hiện xâm nhập mạng máy tính (IDS - Intrusion Detection System), sử dụng các bộ dữ liệu để đánh giá, phân tích. Do sự đa dạng, phức tạp của các bộ dữ liệu nên vấn đề phân cụm, chia nhỏ bộ dữ liệu ra thành các tập con nhưng vẫn giữ được đặc trưng của chúng là rất cần thiết. Trong nghiên cứu này, các tác giả tập trung phân tích đặc điểm của các tập dữ liệu kiểm thử phổ biến. Đồng thời, tiến hành thực nghiệm để đánh giá tính phân cụm, xác định số cụm tối ưu mà một bộ dữ liệu nên được chia ra. Thực nghiệm được tiến hành trên 6 tập dữ liệu huấn luyện của NSL-KDD, UNSW-NB15, CTU-13 phiên bản 08, 09, 10 và 13. Kết quả theo phương pháp Elbow, Silhouetee khá đồng nhất và cho thấy một số bộ dữ liệu nên được tách thành 2, 3 cụm, tuy nhiên cũng có những bộ nên để nguyên.

Từ khóa:

bộ dữ liệu, hệ thống phát hiện xâm nhập, K-Means

Some common datasets of a intrusion detection system and clustering properties

Cong Thanh Bui¹*, Quang Uy Nguyen² , Minh Hoang³

*Tác giả liên hệ: Email: congthanhttmt@gmail.com

¹Communications Command
²Institute of Military Technology
³Institute of Science Technology and Innovation

Received: 1 January 1; accepted: 1 January 1

Abstract:

In recent years, machine learning and deep learning based methods for intrusion detection systems (IDSs) have received great attention from many researchers. IDS datasets have been used to evaluate and analyse these methods. Because of the popularity and complication, the requirement to deeply explore the optimication of clustering, which is known as one of the most useful technique, not only reducing the amount of data but also keeping its characteristics, is necessary for these datasets. In this paper, we focus on analysing the characteristies of IDS common datasets. In addition, we also evaluate the clustering properties and discover the optimal number of clusters which should be divided from a dataset. The experiment has been conducted on six datasets NSL-KDD, UNSW-NB15, and four versions of CTU-13 (08, 09, 10, and 13). Using Elbow and Silhouette methods to determine the optimisation of clustering a dataset has revealed that some datasets should be divided into two or three clusters while some should keep their original forms.

Keywords:

dataset, intrusion detection system, K-Means

Lượt dowload: 814 Lượt xem: 2919

Các tin mới

Các tin đã đưa

Đánh giá

(Di chuột vào ngôi sao để chọn điểm)

Chấm điểm