Supplemental material for S. P. Ponnapalli, M. A. Saunders, C. F. Van Loan and O. Alter, "A Higher-Order Generalized Singular Value Decomposition for Comparison of Global mRNA Expression from Multiple Organisms," Public Library of Science (PLoS) One 6 (12), article e28072 (December 2011); doi: 10.1371/journal.pone.0028072.
Mention: Among the top 10% most cited Public Library of Science (PLoS) One articles as of 2017, PLoS One (June 30, 2017).
Highlight.
Abstract:
The number of high-dimensional datasets recording multiple aspects of a single phenomenon is increasing in many areas of science, accompanied by a need for mathematical frameworks that can compare multiple large-scale matrices with different row dimensions. The only such framework to date, the generalized singular value decomposition (GSVD), is limited to two matrices. We mathematically define a higher-order GSVD (HO GSVD) for N≥2 matrices Di ∈ ℜmi×n, each with full column rank. Each matrix is exactly factored as Di = UiΣiVT, where V, identical in all factorizations, is obtained from the eigensystem SV = VΛ of the arithmetic mean S of all pairwise quotients Ai Aj-1 of the matrices Ai = DiTDi, i≠j. We prove that this decomposition extends to higher orders almost all of the mathematical properties of the GSVD. The matrix S is nondefective with V and Λ real. Its eigenvalues satisfy λk≥1. Equality holds if and only if the corresponding eigenvector vk is a right basis vector of equal significance in all matrices Di and Dj, that is σi,k/σj,k = 1 for all i and j, and the corresponding left basis vector ui,k is orthogonal to all other vectors in Ui for all i. The eigenvalues λk=1, therefore, define the "common HO GSVD subspace." We illustrate the HO GSVD with a comparison of genome-scale cell-cycle mRNA expression from S. pombe, S. cerevisiae and human. Unlike existing algorithms, a mapping among the genes of these disparate organisms is not required. We find that the approximately common HO GSVD subspace represents the cell-cycle mRNA expression oscillations, which are similar among the datasets. Simultaneous reconstruction in the common subspace, therefore, removes the experimental artifacts, which are dissimilar, from the datasets. In the simultaneous sequence-independent classification of the genes of the three organisms in this common subspace, genes of highly conserved sequences but significantly different cell-cycle peak times are correctly classified.
A PDF format file, readable by Adobe Acrobat Reader.
- Ponnapalli_et_al_PLoS_One_2011.pdf
A PDF format file, readable by Adobe Acrobat Reader.
- Ponnapalli_et_al_PLoS_One_2011_Appendix.pdf
- S. pombe global mRNA expression.
A tab-delimited text format file, readable by both Mathematica and Microsoft Excel, reproducing the relative mRNA expression levels of m1=3167 S. pombe gene clones at n=17 time points during about two cell-cycle periods from Rustici et al. and the cell-cycle classifications from Oliva et al.
- S. cerevisiae global mRNA expression.
A tab-delimited text format file, readable by both Mathematica and Microsoft Excel, reproducing the relative mRNA expression levels of m2=4772 S. cerevisiae open reading frames (ORFs), or genes, at n=17 time points during about two cell-cycle periods, including cell-cycle classifications, from Spellman et al.
- Human global mRNA expression.
A tab-delimited text format file, readable by both Mathematica and Microsoft Excel, reproducing the relative mRNA expression levels of m3=13,068 human gene clones at n=17 time points during about two cell-cycle periods, including cell-cycle classifications, from Whitfield et al.