Understanding sequencing data as compositions: an outlook and review

Quinn, Thomas P; Erb, Ionas; Richardson, Mark; Crowley, Tamsyn

Understanding sequencing data as compositions: an outlook and review

journal contribution

posted on 2024-06-04, 03:19 authored by Thomas P Quinn, Ionas Erb, Mark Richardson, Tamsyn CrowleyTamsyn Crowley

Motivation: Although seldom acknowledged explicitly, count data generated by sequencing platforms exist as compositions for which the abundance of each component (e.g. gene or transcript) is only coherently interpretable relative to other components within that sample. This property arises from the assay technology itself, whereby the number of counts recorded for each sample is constrained by an arbitrary total sum (i.e. library size). Consequently, sequencing data, as compositional data, exist in a non-Euclidean space that, without normalization or transformation, renders invalid many conventional analyses, including distance measures, correlation coefficients and multivariate statistical models. Results: The purpose of this review is to summarize the principles of compositional data analysis (CoDA), provide evidence for why sequencing data are compositional, discuss compositionally valid methods available for analyzing sequencing data, and highlight future directions with regard to this field of study. Supplementary information: Supplementary data are available at Bioinformatics online.

History

Journal

Bioinformatics

Volume

34

Pagination

2870-2878

Location

London, Eng.

Open access

Yes

Link to full text

https://academic.oup.com/bioinformatics/article-pdf/34/16/2870/25441977/bty175.pdf

ISSN

1367-4803

eISSN

1460-2059

Language

eng

Author URL

https://www.ncbi.nlm.nih.gov/pubmed/29608657

Publication classification

C1 Refereed article in a scholarly journal

Issue

16

Publisher

Oxford Academic

Usage metrics

Keywords

3102 Bioinformatics and computational biology

Understanding sequencing data as compositions: an outlook and review

History

Journal

Volume

Pagination

Location

Open access

Link to full text

ISSN

eISSN

Language

Author URL

Publication classification

Issue

Publisher

Usage metrics

Categories

Keywords

Licence

Exports