Deakin University
Browse

A mapreduce reinforced distributed sequential pattern mining algorithm

conference contribution
posted on 2015-12-01, 00:00 authored by X Yu, J Liu, Xiao LiuXiao Liu, C Ma, B Li
Redesign and reimplementation of traditional sequential pattern mining algorithms on distributed computing frameworks are essential for dealing with big data. Along the way, the critical issue is how to minimize the communication overhead of the distributed sequential pattern mining algorithm and maximize its execution efficiency by balancing the workload of distributed computing resources. To address such an issue, this paper proposes a MapReduce reinforced distributed sequential pattern mining algorithm DGSP (Distributed GSP algorithm based on MapReduce), which consists of two MapReduce jobs. The “two-jobs” structure of DGSP can effectively reduce the communication overhead of the distributed sequential pattern mining algorithm. DGSP also enables optimizing the workload balance and the execution efficiency of distributed sequential pattern mining by evenly partitioning the database and assigning the fragments to Map workers. Experimental results indicate that DGSP can significantly improve the overall performance, scalability and fault tolerance of sequential pattern mining on big data.

History

Volume

9529

Pagination

183-197

Location

Zhangjiajie, China

Start date

2015-11-18

End date

2015-11-20

ISSN

0302-9743

eISSN

1611-3349

ISBN-13

9783319271217

Language

eng

Publication classification

E Conference publication, E1.1 Full written paper - refereed

Copyright notice

[2015, Springer]

Editor/Contributor(s)

Wang G, Zomaya A, Perez GM, Li K

Title of proceedings

ICA3PP 2015 : Proceedings of the 15th International Conference on Algorithms and Architectures for Parallel Processing, Part 2

Event

Algorithms and Architectures for Parallel Processing. International Conference (15th : 2015 : Zhangjiajie, China)

Publisher

Springer

Place of publication

Berlin, Germany

Series

Lecture Notes in Computer Science

Usage metrics

    Research Publications

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC