File(s) under permanent embargo
Semi-supervised and compound classification of network traffic
journal contributionposted on 2012-01-01, 00:00 authored by Jun Zhang, Chao Chen, Yang Xiang, Wanlei Zhou
This paper presents a new semi-supervised method to effectively improve traffic classification performance when very few supervised training data are available. Existing semisupervised methods label a large proportion of testing flows as unknown flows due to limited supervised information, which severely affects the classification performance. To address this problem, we propose to incorporate flow correlation into both training and testing stages. At the training stage, we make use of flow correlation to extend the supervised data set by automatically labelling unlabelled flows according to their correlation to the pre-labelled flows. Consequently, a traffic classifier achieves excellent performance because of the enhanced training data set. At the testing stage, the correlated flows are identified and classified jointly by combining their individual predictions, so as to further boost the classification accuracy. The empirical study on the real-world network traffic shows that the proposed method significantly outperforms the state-of-the-art flow statistical feature based classification methods. Copyright © 2012 Inderscience Enterprises Ltd.