Supplemental material for C. Muralidhara, A. M. Gross, R. R. Gutell and O. Alter, "Tensor Decomposition Reveals Concurrent Evolutionary Convergences and Divergences and Correlations with Structural Motifs in Ribosomal RNA," Public Library of Science (PLoS) One 6 (4), article e18768 (April 2011); doi: 10.1371/journal.pone.0018768.
Highlight.
Abstract:
Evolutionary relationships among organisms are commonly described by using a hierarchy derived from comparisons of ribosomal RNA (rRNA) sequences. We propose that even on the level of a single rRNA molecule, an organism's evolution is composed of multiple pathways due to concurrent forces that act independently upon different rRNA degrees of freedom. Relationships among organisms are then compositions of coexisting pathway-dependent similarities and dissimilarities, which cannot be described by a single hierarchy. We computationally test this hypothesis in comparative analyses of 16S and 23S rRNA sequence alignments by using a tensor decomposition, i.e., a framework for modeling composite data. Each alignment is encoded in a cuboid, i.e., a third-order tensor, where nucleotides, positions and organisms, each represent a degree of freedom. A tensor mode-1 higher-order singular value decomposition (HOSVD) is formulated such that it separates each cuboid into combinations of patterns of nucleotide frequency variation across organisms and positions, i.e., "eigenpositions" and corresponding nucleotide-specific segments of "eigenorganisms," respectively, independent of a-priori knowledge of the taxonomic groups or rRNA structures. We find, in support of our hypothesis that, first, the significant eigenpositions reveal multiple similarities and dissimilarities among the taxonomic groups. Second, the corresponding eigenorganisms identify insertions or deletions of nucleotides exclusively conserved within the corresponding groups, that map out entire substructures and are enriched in adenosines, unpaired in the rRNA secondary structure, that participate in tertiary structure interactions. This demonstrates that structural motifs involved in rRNA folding and function are evolutionary degrees of freedom. Third, two previously unknown coexisting subgenic relationships between Microsporidia and Archaea are revealed in both the 16S and 23S rRNA alignments, a convergence and a divergence, conferred by insertions and deletions of these motifs, which cannot be described by a single hierarchy. This shows that mode-1 HOSVD modeling of rRNA alignments might be used to computationally predict evolutionary mechanisms.



A PDF format file, readable by Adobe Acrobat Reader.
Muralidhara_et_al_PLoS_One_2011.pdf



A PDF format file, readable by Adobe Acrobat Reader.
Muralidhara_et_al_PLoS_One_2011_Figures.pdf



A PDF format file, readable by Adobe Acrobat Reader.
Muralidhara_et_al_PLoS_One_2011_Appendix.pdf



Taxonomy annotations of the organisms in the 16S rRNA alignment.
A tab-delimited text format file, readable by both Mathematica and Microsoft Excel, reproducing the National Center for Biotechnology Information (NCBI) Taxonomy Browser annotations by Sayers et al. of the 339 organisms in the 16S alignment.
16S rRNA alignment.
Tab-delimited text format files, readable by both Mathematica and Microsoft Excel, reproducing the alignment of 16S rRNA sequences from the Comparative RNA Website (CRW) by Cannone et al., tabulating six sequence elements, i.e., A, C, G and U nucleotides, unknown ("N") and gap ("–"), across the 339 organisms and the 3249 sequence positions.
Base-pairing of the positions of the 16S rRNA alignment.
Tab-delimited text format files, readable by both Mathematica and Microsoft Excel, reproducing the base-pairing of the positions of the 16S rRNA alignment in the secondary structure models of the 16S sequences from the CRW, tabulating base-paired ("Y") and unpaired ("N") nucleotides as well as gaps ("–"), across the 339 organisms and the 3249 sequence positions.
Taxonomy annotations of the organisms in the 23S rRNA alignment.
A tab-delimited text format file, readable by both Mathematica and Microsoft Excel, reproducing the NCBI Taxonomy Browser annotations of the 75 organisms in the 23S alignment.
23S rRNA alignment.
Tab-delimited text format files, readable by both Mathematica and Microsoft Excel, reproducing the alignment of 23S rRNA sequences from the CRW, tabulating base-paired ("Y") and unpaired ("N") nucleotides as well as gaps ("–"), across the 75 organisms and the 6636 sequence positions.
Base-pairing of the positions of the 16S rRNA alignment.
Tab-delimited text format files, readable by both Mathematica and Microsoft Excel, reproducing the base-pairing of the positions of the 23S rRNA alignment in the secondary structure models of the 23S sequences from the CRW, tabulating base-paired ("Y") and unpaired ("N") nucleotides as well as gaps ("–"), across the 75 organisms and the 6636 sequence positions.
Mitochondrial 16S rRNA alignment with taxonomy annotations of the organisms.
Tab-delimited text format files, readable by both Mathematica and Microsoft Excel, reproducing the alignment of 858 mitochondrial 16S rRNA sequences from the the CRW, tabulating six sequence elements across the 858 organisms and the 3249 sequence positions, as well as reproducing the NCBI Taxonomy Browser annotations of the 858 organisms.