- For postdoctoral, graduate, and advanced undergraduate students in Engineering, Sciences, and Medicine, and professionals in industry.
- Spring 2022, Mondays and Wednesdays 11:50am–1:10pm, Zoom. Prerequisites: Some experience programming and instructor approval.
- Technologies, such as whole genome-sequencing, for high-throughput acquisition of different types of molecular biological data and patient clinical information.
- Databases, such as the Cancer Genome Atlas (TCGA) at the Genomic Data Commons (GDC).
- Algorithms, from the singular value decomposition (SVD) and principal component analysis (PCA) to multi-tensor decompositions, neural networks, and deep learning.
- Applications toward better understanding of biology and practice of medicine, e.g., personalized cancer diagnostics, prognostics, and therapeutics.
- Proving mathematical theorems and programming symbolic computations.
- Designing algorithms and programming numerical computations.
- Working with databases and modeling biomedical data.
- In-class presentations of scientific journal articles and patents.
- Participation in guest lectures and seminars on campus and discussions of conference reports.
- End-of-class celebration.
- Syllabus
- COVID-19
- Spring 2022 Calendar
- Safety
- Health, Wellness, and Counseling
- Student Code

100% grade = 30% labs, 30% presentation, 30% class project, 10% class participation; late assignments are not accepted; class attendance is required.

Topics:

We will cover concepts in artificial intelligence, data science, and machine learning and their applications in the integration and comparison of different types of high-throughput omic and other data acquired by different technologies and from different studies toward discovery, verification, and validation of biomedical principles.

Skills:

Activities:

Readings on the SVD and deep learning:

- Book 1:

Book 2:

January 10:

- Welcome!

January 12:

- Introduction:

How Bright Promise in Cancer Testing Fell Apart,

- The SVD in the news:

If You Liked This, You're Sure to Love That,

- Timeline of the human genome project:

From Darwin and Mendel to the human genome project,

- Paper 1: The Alta Summit, December 1984, Cook-Deegan,

Paper 2: A Vision for the Future of Genomics Research, Collins et al.,

Acknowledgements.

January 17:

- Happy Martin Luther King Jr. Day!

January 19:

- Mathematics of the SVD:

- Notebook 1: Computation and Visualization of the SVD

Mathematica Code: Notebook_1.nb

- Lab 1:

Code the SVD or the pseudoinverse projection of synthetic data and its visualization. Test and debug your code.

January 24:

- Composition and decomposition of synthetic data:

- Notebook 2: The SVD of Synthetic Data

Mathematica Code: Notebook_2.nb

January 31:

- In-Class Work on Lab 1

February 2:

- In-Class Work on Lab 1

February 7:

- Slides 1: Examples of the SVD of measured data

- More examples of the SVD of measured data:

Paper 3: Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling, Alter et al.,

Patent 1: Method for Node Ranking in a Linked Database, Page,

Paper 4: A Rapid Genome-Scale Response of the Transcriptional Oscillator to Perturbation Reveals a Period-Doubling Path to Phenotypic Change, Li and Klevecz,

Paper 5: Coordinated Metabolic Transitions During

February 9:

- Mathematics of the pseudoinverse projection:

February 9, 1:20–2:00pm:

- Slides 2: Data integration by using the pesudoinverse projection

- Notebook 3: The Pseudoinverse Projection of Measured Data

Mathematica Code: Notebook_3.nb

February 12:

- Happy Darwin Day!

February 14:

- Examples of the pseudoinverse projection of measured data:

Paper 6: Integrative Analysis of Genome-Scale Data by Using Pseudoinverse Projection Predicts Novel Correlation between DNA Replication and RNA Transcription, Alter and Golub,

Paper 7: Reconstructing the Pathways of a Cellular System from Genome-Scale Signals by Using Matrix and Tensor Computations, Alter and Golub,

Paper 8: Distinct Physiological States of

Paper 9: Using Pre-Existing Microarray Datasets to Increase Experimental Power: Application to Insulin Resistance, Daigle et al.,

February 16:

- Example of TCGA data:

Paper 10: Comprehensive Genomic Characterization Defines Human Glioblastoma Genes and Core Pathways, TCGA Research Network,

February 21:

- Happy Presidents Day!

- In-Class Work on Lab 1

February 23:

- Examples of interpretation of TCGA data:

Paper 11: Mathematically Universal and Biologically Consistent Astrocytoma Genotype Encodes for Transformation and Predicts Survival Phenotype, Aiello et al.,

Supplementary Material.

Paper 12: GSVD Comparison of Patient-Matched Normal and Tumor aCGH Profiles Reveals Global Copy-Number Alterations Predicting Glioblastoma Multiforme Survival, Lee et al.,

- Examples of assessing the statistical significance of an interpretation:

Paper 13: Systematic Determination of Genetic Network Architecture, Tavazoie et al.,

Paper 14: Discovering Motifs in Ranked Lists of DNA Sequences, Eden et al.,

Paper 15: GOrilla: A Tool for Discovery and Visualization of Enriched GO Terms in Ranked Gene Lists, Eden et al.,

February 28:

- Lab 1 Due In-Class

- In-Class Work on Lab 2:

Compute and visualize the SVD or the pseudoinverse projection of your data. Interpret your data based upon its SVD. Use at least two different approaches each for preprocessing and sorting your data and for assessing the statistical significance of your interpretation.

- Slides 3: The hypergeometric probability distribution and

- Notebook 4: The Hypergeometric Probability Distribution and

Mathematica Code: Notebook_4.nb

February 29:

- Gene H. Golub's Birthday!

Paper 16: Calculating the Singular Values and Pseudo-Inverse of a Matrix, Golub and Kahan,

March 2:

- Lab 2 "Data Clinic"

March 7:

- Happy Spring Break!

March 9:

- Happy Spring Break!

March 14:

- Slides 4: The SVD as a Transform

- Quantum Harmonic Oscillator from Wikipedia

- Image Compression via the SVD from Mathworld

- Image Compression via the Fourier Transform from Mathworld

March 16:

- From the SVD to PCA:

- Slides 5: The SVD vs. PCA

- Paper 17: Correspondence Analysis Applied to Microarray Data, Fellenberg et al.,

March 21:

- Lab 2 "Data Clinic"

- Selection of a cutoff of the singular values:

Paper 18: Component Retention in Principal Component Analysis with Application to cDNA Microarray Data, Cangelosi and Goriely,

Paper 19: The Optimal Hard Threshold for Singular Values is 4/√3, Gavish and Donoho,

- Robust PCA and removal of outliers:

Paper 20: Sparsity Control for Robust Principal Component Analysis, Mateos and Giannakis, in

Paper 21: Robust Principal Component Analysis? Candès et al.,

March 23:

- Mathematical variations on the SVD and PCA for blind source separation (BSS):

- Independent component analysis (ICA):

Paper 22: Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images, Olshausen and Field,

Paper 23: The "Independent Components" of Natural Scenes are Edge Filters, Bell and Sejnowski,

Paper 24: Linear Modes of Gene Expression Determined by Independent Component Analysis, Liebermeister,

- Nonnegative matrix factorization (NMF):

Paper 25: Learning the Parts of Objects by Non-Negative Matrix Factorization, Lee and Seung,

Paper 26: Metagenes and Molecular Pattern Discovery Using Matrix Factorization, Brunet et al.,

March 28:

- Mathematics of the generalized SVD (GSVD):

- Paper 27: The GSVD: Where are the Ellipses?, Matrix Trigonometry, and More, Edelman and Wang,

March 30:

- Computation of the GSVD:

Notebook 5: The GSVD of Synthetic Data

Mathematica Code: Notebook_5.nb

April 1, 11:50am–1:10pm:

- Lab 2 "Data Clinic"

April 4:

- Slides 6: Examples of the GSVD of measured data

April 6:

- Lab 2 Due In-Class

- More examples of the GSVD of measured data:

Paper 28: Generalized Singular Value Decomposition for Comparative Analysis of Genome-Scale Expression Datasets of Two Different Organisms, Alter et al.,

Paper 29: Combining Transcriptional Datasets Using the Generalized Singular Value Decomposition, by Schreiber et al.,

Paper 30: Exploring Metabolic Pathway Disruption in the Subchronic Phencyclidine Model of Schizophrenia with the Generalized Singular Value Decomposition, by Xiao et al.,

April 11:

- Examples of canonical correlation analysis (CCA) of measured data:

Paper 31: A New Muscle Artifact Removal Technique to Improve the Interpretation of the Ictal Scalp Electroencephalogram, by De Clercq et al.,

Paper 32: Integrating Multiple-Study Multiple-Subject fMRI Datasets Using Canonical Correlation Analysis, Rustandi et al., in

- Examples of generalizations of the GSVD of measured data:

Paper 33: A Higher-Order Generalized Singular Value Decomposition for Comparison of Global mRNA Expression from Multiple Organisms, Ponnapalli et al.,

Paper 34: Multi-Tissue Analysis of Co-expression Networks by Higher-Order Generalized Singular Value Decomposition Identifies Functionally Coherent Transcriptional Modules, by Xiao et al.,

Paper 35: Tensor GSVD of Patient- and Platform-Matched Tumor and Normal DNA Copy-Number Profiles Uncovers Chromosome Arm-Wide Patterns of Tumor-Exclusive Platform-Consistent Alterations Encoding for Cell Transformation and Predicting Ovarian Cancer Survival, Sankaranarayanan et al.,

Paper 36: TNF-Insulin Crosstalk at the Transcription Factor GATA6 is Revealed by a Model that Links Signaling and Transcriptomic Data Tensors, by Chitforoushzadeh et al.,

April 13:

- Example of verification:

Paper 37: Global Effects of DNA Replication and DNA Replication Origin Activity on Eukaryotic Gene Expression, Omberg et al.,

April 18:

- "State of the project" presentations

April 20:

- "State of the project" presentations

April 25:

- End-of-class celebration!

- Example of validation:

Paper 38: Retrospective Clinical Trial Experimentally Validates Glioblastoma Genome-Wide Pattern of DNA Copy-Number Alterations Predictor of Survival, Ponnapalli et al.,

Happy Summer Break!

- DNA,

See you in Fall 2022 in BME 6780: Data Science for Bioengineers!