Deakin University
Browse
yearwood-machinelearning-2012.pdf (649.61 kB)

Machine learning algorithms for analysis of DNA data sets

Download (649.61 kB)
chapter
posted on 2012-06-01, 00:00 authored by John YearwoodJohn Yearwood, A Bagirov, A Kelarev
The applications of machine learning algorithms to the analysis of data sets of DNA sequences are very important. The present chapter is devoted to the experimental investigation of applications of several machine learning algorithms for the analysis of a JLA data set consisting of DNA sequences derived from non-coding segments in the junction of the large single copy region and inverted repeat A of the chloroplast genome in Eucalyptus collected by Australian biologists. Data sets of this sort represent a new situation, where sophisticated alignment scores have to be used as a measure of similarity. The alignment scores do not satisfy properties of the Minkowski metric, and new machine learning approaches have to be investigated. The authors' experiments show that machine learning algorithms based on local alignment scores achieve very good agreement with known biological classes for this data set. A new machine learning algorithm based on graph partitioning performed best for clustering of the JLA data set. Our novel k-committees algorithm produced most accurate results for classification. Two new examples of synthetic data sets demonstrate that the authors' k-committees algorithm can outperform both the Nearest Neighbour and k-medoids algorithms simultaneously.

History

Title of book

Machine learning algorithms for problem solving in computational applications: intelligent techniques

Chapter number

4

Pagination

47 - 58

Publisher

IGI Global

Place of publication

Hershey, Pa.

ISBN-13

9781466618336

Language

eng

Publication classification

B Book chapter; B1.1 Book chapter

Copyright notice

2012, IGI

Extent

21

Editor/Contributor(s)

S Kulkarni

Usage metrics

    Research Publications

    Exports