Deakin University
Browse

File(s) under permanent embargo

Optimization methods and the k-committees algorithm for clustering of sequence data

Version 2 2024-06-04, 04:13
Version 1 2017-08-03, 12:13
journal contribution
posted on 2024-06-04, 04:13 authored by John YearwoodJohn Yearwood, AM Bagirov, AV Kelarev
The present paper is devoted to new algorithms for unsupervised clustering based on the optimization approaches due to [2], [3] and [4]. We consider a novel situation, where the datasets consist of nucleotide or protein sequences and rather sophisticated biologically significant alignment scores have to be used as a measure of distance. Sequences of this kind cannot be regarded as points in a finite dimensional space. Besides, the alignment scores do not satisfy properties of Minkowski metrics. Nevertheless the optimization approaches have made it possible to introduce a new k-committees algorithm and compare its performance with previous algorithms for two datasets. Our experimental results show that the k-committees algorithms achieves intermediate accuracy for a dataset of ITS sequences, and it can perform better than the discrete k-means and Nearest Neighbour algorithms for certain datasets. All three algorithms achieve good agreement with clusters published in the biological literature before and can be used to obtain biologically significant clusterings.

History

Journal

Applied and computational mathematics

Volume

8

Pagination

92-101

ISSN

1683-3511

Language

eng

Publication classification

CN.1 Other journal article

Issue

1

Publisher

Science Publishing Group

Usage metrics

    Research Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC