The applications of machine learning algorithms to the analysis of data sets of DNA sequences are very important. The present chapter is devoted to the experimental investigation of applications of several machine learning algorithms for the analysis of a JLA data set consisting of DNA sequences derived from non-coding segments in the junction of the large single copy region and inverted repeat A of the chloroplast genome in Eucalyptus collected by Australian biologists. Data sets of this sort represent a new situation, where sophisticated alignment scores have to be used as a measure of similarity. The alignment scores do not satisfy properties of the Minkowski metric, and new machine learning approaches have to be investigated. The authors' experiments show that machine learning algorithms based on local alignment scores achieve very good agreement with known biological classes for this data set. A new machine learning algorithm based on graph partitioning performed best for clustering of the JLA data set. Our novel k-committees algorithm produced most accurate results for classification. Two new examples of synthetic data sets demonstrate that the authors' k-committees algorithm can outperform both the Nearest Neighbour and k-medoids algorithms simultaneously.
History
Chapter number
4
Pagination
47-58
Open access
Yes
ISBN-13
9781466618336
Language
eng
Publication classification
B Book chapter, B1.1 Book chapter
Copyright notice
2012, IGI
Extent
21
Editor/Contributor(s)
Kulkarni S
Publisher
IGI Global
Place of publication
Hershey, Pa.
Title of book
Machine learning algorithms for problem solving in computational applications: intelligent techniques