Deakin University

File(s) under permanent embargo

Learning Sparse Log-Ratios for High-Throughput Sequencing Data

journal contribution
posted on 2021-02-12, 00:00 authored by Elliott Gordon-Rodriguez, Thomas Quinn, John P Cunningham
AbstractThe automatic discovery of interpretable features that are associated to an outcome of interest is a central goal of bioinformatics. In the context of high-throughput genetic sequencing data, and Compositional Data more generally, an important class of features are the log-ratios between subsets of the input variables. However, the space of these log-ratios grows combinatorially with the dimension of the input, and as a result, existing learning algorithms do not scale to increasingly common high-dimensional datasets. Building on recent literature on continuous relaxations of discrete latent variables, we design a novel learning algorithm that identifies sparse log-ratios several orders of magnitude faster than competing methods. As well as dramatically reducing runtime, our method outperforms its competitors in terms of sparsity and predictive accuracy, as measured across a wide range of benchmark datasets.



Cold Spring Harbor Laboratory

Publication classification

CN Other journal article

Usage metrics

    Research Publications


    No categories selected