A novel semi-supervised approach for network traffic clustering

Wang, Yu, Xiang, Yang, Zhang, Jun and Yu, Shunzheng 2011, A novel semi-supervised approach for network traffic clustering, in NSS 2011 : Proceedings of the 5th International Conference on Network and System Security, IEEE, [Milan, Italy], pp. 169-175.

Attached Files
Name Description MIMEType Size Downloads

Title A novel semi-supervised approach for network traffic clustering
Author(s) Wang, Yu
Xiang, Yang
Zhang, Jun
Yu, Shunzheng
Conference name Network and System Security. Conference (5th : 2011 : Milan, Italy)
Conference location Milan, Italy
Conference dates 6-8 Sep. 2011
Title of proceedings NSS 2011 : Proceedings of the 5th International Conference on Network and System Security
Editor(s) [Unknown]
Publication date 2011
Conference series Network and System Security. Conference
Start page 169
End page 175
Total pages 7
Publisher IEEE
Place of publication [Milan, Italy]
Keyword(s) traffic classification
machine learning
contstrained clustering
semi-supervised learning
constraints
Summary Network traffic classification is an essential component for network management and security systems. To address the limitations of traditional port-based and payload-based methods, recent studies have been focusing on alternative approaches. One promising direction is applying machine learning techniques to classify traffic flows based on packet and flow level statistics. In particular, previous papers have illustrated that clustering can achieve high accuracy and discover unknown application classes. In this work, we present a novel semi-supervised learning method using constrained clustering algorithms. The motivation is that in network domain a lot of background information is available in addition to the data instances themselves. For example, we might know that flow ƒ1 and ƒ2 are using the same application protocol because they are visiting the same host address at the same port simultaneously. In this case, ƒ1 and ƒ2 shall be grouped into the same cluster ideally. Therefore, we describe these correlations in the form of pair-wise must-link constraints and incorporate them in the process of clustering. We have applied three constrained variants of the K-Means algorithm, which perform hard or soft constraint satisfaction and metric learning from constraints. A number of real-world traffic traces have been used to show the availability of constraints and to test the proposed approach. The experimental results indicate that by incorporating constraints in the course of clustering, the overall accuracy and cluster purity can be significantly improved.
ISBN 9781457704581
Language eng
Field of Research 080503 Networking and Communications
Socio Economic Objective 890201 Application Software Packages (excl. Computer Games)
HERDC Research category E1 Full written paper - refereed
Copyright notice ©2011, IEEE
Persistent URL http://hdl.handle.net/10536/DRO/DU:30042389

Document type: Conference Paper
Collection: School of Information Technology
Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Versions
Version Filter Type
Citation counts: Scopus Citation Count Cited 3 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 76 Abstract Views, 105 File Downloads  -  Detailed Statistics
Created: Tue, 14 Feb 2012, 15:46:51 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.