Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers

Church, Philip C., Goscinski, Andrzej, Holt, Kathryn, Inouye, Michael, Ghoting, Amol, Makarychev, Konstantin and Reumann, Matthias 2011, Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers, in EMBC 2011 : Proceedings of the 33rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE, [Boston, Mass.], pp. 924-927.

Attached Files
Name Description MIMEType Size Downloads

Title Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers
Author(s) Church, Philip C.
Goscinski, Andrzej
Holt, Kathryn
Inouye, Michael
Ghoting, Amol
Makarychev, Konstantin
Reumann, Matthias
Conference name IEEE Engineering in Medicine and Biology Society. Conference (33rd : 2011 : Boston, Mass.)
Conference location Boston, Mass.
Conference dates 30 Aug.-3 Sep. 2011
Title of proceedings EMBC 2011 : Proceedings of the 33rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society
Editor(s) [Unknown]
Publication date 2011
Conference series IEEE Engineering in Medicine and Biology Society. Conference
Start page 924
End page 927
Publisher IEEE
Place of publication [Boston, Mass.]
Keyword(s) algorithm design and analysis
bioinformatics
educational institutions
genomics
hidden markov models
random access memory
supercomputers
Summary The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.
ISBN 1424441226
9781424441228
ISSN 1557-170X
Language eng
Field of Research 080501 Distributed and Grid Systems
Socio Economic Objective 890206 Internet Hosting Services (incl. Application Hosting Services)
HERDC Research category E1 Full written paper - refereed
Copyright notice ©2011, IEEE
Persistent URL http://hdl.handle.net/10536/DRO/DU:30042396

Document type: Conference Paper
Collection: School of Information Technology
Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Versions
Version Filter Type
Access Statistics: 74 Abstract Views, 120 File Downloads  -  Detailed Statistics
Created: Tue, 14 Feb 2012, 15:47:35 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.