Application of a novel “pan-genome”-based strategy for assigning RNAseq transcript reads to Staphylococcus aureus strains
journal contributionposted on 2015-01-01, 00:00 authored by Diego Chaves-Moreno, Melissa L Wos-Oxley, Ruy Jáuregui, Eva Medina, Andrew OxleyAndrew Oxley, Dietmar H Pieper
Understanding the behaviour of opportunistic pathogens such as Staphylococcus aureus in their natural human niche holds great medical interest. With the development of sensitive molecular methods and deep-sequencing technology, it is now possible to robustly assess the global transcriptome of bacterial species in their human habitat. However, as the genomes of the colonizing strains are often not available compiling the pan-genome for the species of interest may provide an effective method to reliably and rapidly compile the transcriptome of a bacterial species. The pan-genome of S. aureus and its associated core and accessory components were compiled based on 25 genomes and comprises a total of 65,557 proteins clustering into 4,198 Orthologous Groups (OGs). The generated gene catalogue was used to assign RNAseq-derived sequence reads to S. aureus in a variety of in vitro and in vivo samples. In all cases, the number of reads that could be assigned to S. aureus was greater using the OG database than using a reference genome. Growth of two S. aureus strains in synthetic nasal medium confirmed that both strains experienced strong iron starvation. Traits such as purine metabolism appeared to be more affected in a typical nasal colonizer than in a strain representative of the S. aureus USA300 lineage. Mapping sequencing reads from a metatranscriptome generated from the human anterior nares allowed the identification of genes highly expressed by S. aureus in vivo. The OG database generated in this study represents a useful tool to obtain a snapshot of the functional attributes of S. aureus under different in vitro and in vivo conditions. The approach proved to be advantageous to assign sequencing reads to bacterial strains when RNAseq data is derived from samples where strain information and/or the corresponding genome/s are unavailable.