Machine learning algorithms for analysis of DNA data sets

Yearwood, John; Bagirov, A; Kelarev, A

Machine learning algorithms for analysis of DNA data sets

chapter

posted on 2012-06-01, 00:00 authored by John YearwoodJohn Yearwood, A Bagirov, A Kelarev

The applications of machine learning algorithms to the analysis of data sets of DNA sequences are very important. The present chapter is devoted to the experimental investigation of applications of several machine learning algorithms for the analysis of a JLA data set consisting of DNA sequences derived from non-coding segments in the junction of the large single copy region and inverted repeat A of the chloroplast genome in Eucalyptus collected by Australian biologists. Data sets of this sort represent a new situation, where sophisticated alignment scores have to be used as a measure of similarity. The alignment scores do not satisfy properties of the Minkowski metric, and new machine learning approaches have to be investigated. The authors' experiments show that machine learning algorithms based on local alignment scores achieve very good agreement with known biological classes for this data set. A new machine learning algorithm based on graph partitioning performed best for clustering of the JLA data set. Our novel k-committees algorithm produced most accurate results for classification. Two new examples of synthetic data sets demonstrate that the authors' k-committees algorithm can outperform both the Nearest Neighbour and k-medoids algorithms simultaneously.

History

Chapter number

4

Pagination

47-58

Publisher DOI

https://doi.org/10.4018/978-1-4666-1833-6.ch004

Open access

Yes

ISBN-13

9781466618336

Language

eng

Publication classification

B Book chapter, B1.1 Book chapter

Copyright notice

2012, IGI

Extent

21

Editor/Contributor(s)

Kulkarni S

Publisher

IGI Global

Place of publication

Hershey, Pa.

Title of book

Machine learning algorithms for problem solving in computational applications: intelligent techniques

Usage metrics

Keywords

School of Information Technology Centre for Pattern Recognition and Data Analytics Pattern Recognition and Data Analytics

Machine learning algorithms for analysis of DNA data sets

History

Chapter number

Pagination

Publisher DOI

Open access

ISBN-13

Language

Publication classification

Copyright notice

Extent

Editor/Contributor(s)

Publisher

Place of publication

Title of book

Usage metrics

Categories

Keywords

Licence

Exports