Deakin University
Browse
zhang-amapreducebased-2016.pdf (21 MB)

A MapReduce-based nearest neighbor approach for big-data-driven traffic flow prediction

Download (21 MB)
journal contribution
posted on 2016-01-01, 00:00 authored by D Xia, H Li, B Wang, Y Li, Zili ZhangZili Zhang
In big-data-driven traffic flow prediction systems, the robustness of prediction performance depends on accuracy and timeliness. This paper presents a new MapReduce-based nearest neighbor (NN) approach for traffic flow prediction using correlation analysis (TFPC) on a Hadoop platform. In particular, we develop a real-time prediction system including two key modules, i.e., offline distributed training (ODT) and online parallel prediction (OPP). Moreover, we build a parallel k-nearest neighbor optimization classifier, which incorporates correlation information among traffic flows into the classification process. Finally, we propose a novel prediction calculation method, combining the current data observed in OPP and the classification results obtained from large-scale historical data in ODT, to generate traffic flow prediction in real time. The empirical study on real-world traffic flow big data using the leave-one-out cross validation method shows that TFPC significantly outperforms four state-of-the-art prediction approaches, i.e., autoregressive integrated moving average, Naïve Bayes, multilayer perceptron neural networks, and NN regression, in terms of accuracy, which can be improved 90.07% in the best case, with an average mean absolute percent error of 5.53%. In addition, it displays excellent speedup, scaleup, and sizeup.

History

Journal

IEEE Access

Volume

4

Pagination

2920 - 2934

Publisher

IEEE

Location

Piscataway, N. J.

eISSN

2169-3536

Language

eng

Publication classification

C Journal article; C1 Refereed article in a scholarly journal

Copyright notice

2016, IEEE

Usage metrics

    Research Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC