posted on 2012-01-01, 00:00authored byYu Wang, Yang Xiang, Jun Zhang, S Yu
Due to the limitations of the traditional port-based and payload-based traffic classification approaches, the past decade has seen extensive work on utilizing machine learning techniques to classify network traffic based on packet and flow level features. In particular, previous studies have shown that the unsupervised clustering approach is both accurate and capable of discovering previously unknown application classes. In this paper, we explore the utility of side information in the process of traffic clustering. Specifically, we focus on the flow correlation information that can be efficiently extracted from packet headers and expressed as instance-level constraints, which indicate that particular sets of flows are using the same application and thus should be put into the same cluster. To incorporate the constraints, we propose a modified constrained K-Means algorithm. A variety of real-world traffic traces are used to show that the constraints are widely available. The experimental results indicate that the constrained approach not only improves the quality of the resulted clusters, but also speeds up the convergence of the clustering process.
History
Event
IEEE Wireless Communications and Mobile Computing. Conference (8th : 2012 : Limassol, Cyprus)
Pagination
619 - 624
Publisher
IEEE
Location
Limassol, Cyprus
Place of publication
Piscataway, N. J.
Start date
2012-08-27
End date
2012-08-31
ISBN-13
9781457713781
Language
eng
Publication classification
E1 Full written paper - refereed
Copyright notice
2012, IEEE
Title of proceedings
IWCMC 2012 : Proceedings of the IEEE 8th Wireless Communications and Mobile Computing Conference