Deakin University
Browse

File(s) under permanent embargo

Reliable aggregation on network traffic for web based knowledge discovery

Version 2 2024-06-04, 03:28
Version 1 2014-10-28, 09:38
chapter
posted on 2024-06-04, 03:28 authored by S Yu, Simon JamesSimon James, T Yonghong, W Dou
The web is a rich resource for information discovery, as a result web mining is a hot topic. However, a reliable mining result depends on the reliability of the data set. For every single second, the web generate huge amount of data, such as web page requests, file transportation. The data reflect human behavior in the cyber space and therefore valuable for our analysis in various disciplines, e.g. social science, network security. How to deposit the data is a challenge. An usual strategy is to save the abstract of the data, such as using aggregation functions to preserve the features of the original data with much smaller space. A key problem, however is that such information can be distorted by the presence of illegitimate traffic, e.g. botnet recruitment scanning, DDoS attack traffic, etc. An important consideration in web related knowledge discovery then is the robustness of the aggregation method , which in turn may be affected by the reliability of network traffic data. In this chapter, we first present the methods of aggregation functions, and then we employe information distances to filter out anomaly data as a preparation for web data mining.

History

Chapter number

8

Pagination

149-159

ISBN-13

9781461419020

ISBN-10

1461419026

Language

eng

Publication classification

B1 Book chapter

Copyright notice

2012, Springer

Extent

17

Editor/Contributor(s)

Dai H, Liu J, Smirnov E

Publisher

Springer

Place of publication

New York, NY

Title of book

Reliable knowledge discovery

Usage metrics

    Research Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC