Internet traffic clustering with side information
Version 2 2024-06-06, 00:27Version 2 2024-06-06, 00:27
Version 1 2015-03-30, 11:04Version 1 2015-03-30, 11:04
journal contribution
posted on 2024-06-06, 00:27 authored by Y Wang, Y Xiang, J Zhang, W Zhou, B XieInternet traffic classification is a critical and essential functionality for network management and security systems. Due to the limitations of traditional port-based and payload-based classification approaches, the past several years have seen extensive research on utilizing machine learning techniques to classify Internet traffic based on packet and flow level characteristics. For the purpose of learning from unlabeled traffic data, some classic clustering methods have been applied in previous studies but the reported accuracy results are unsatisfactory. In this paper, we propose a semi-supervised approach for accurate Internet traffic clustering, which is motivated by the observation of widely existing partial equivalence relationships among Internet traffic flows. In particular, we formulate the problem using a Gaussian Mixture Model (GMM) with set-based equivalence constraint and propose a constrained Expectation Maximization (EM) algorithm for clustering. Experiments with real-world packet traces show that the proposed approach can significantly improve the quality of resultant traffic clusters. © 2014 Elsevier Inc.
History
Journal
Journal of Computer and System SciencesVolume
80Pagination
1021-1036Location
Amsterdam, The NetherlandsPublisher DOI
Open access
- Yes
Link to full text
ISSN
1090-2724eISSN
1090-2724Language
engPublication classification
C Journal article, C1 Refereed article in a scholarly journalCopyright notice
2014, ElsevierIssue
5Publisher
ElsevierUsage metrics
Categories
Keywords
Constrained clusteringSemi-supervised machine learningTraffic classificationScience & TechnologyTechnologyComputer Science, Hardware & ArchitectureComputer Science, Theory & MethodsComputer ScienceCLASSIFICATION080504 Ubiquitous Computing080605 Decision Support and Group Support Systems890101 Fixed Line Data Networks and Services890205 Information Processing Services (incl. Data Entry and Capture)School of Information Technology
Licence
Exports
RefWorksRefWorks
BibTeXBibTeX
Ref. managerRef. manager
EndnoteEndnote
DataCiteDataCite
NLMNLM
DCDC