- For postdoctoral, graduate, and advanced undergraduate students in Engineering, Sciences, and Medicine, and professionals in industry.
- Spring 2023, Mondays and Wednesdays 11:50am–1:10pm, LCB 115 and Zoom; office hours by request or Wednesdays 11:00am, WEB 3803. Prerequisites: Some experience programming and instructor approval.
- Technologies, such as whole genome-sequencing, for high-throughput acquisition of different types of molecular biological data and patient clinical information.
- Databases, such as the Cancer Genome Atlas (TCGA) at the Genomic Data Commons (GDC).
- Algorithms, from the singular value decomposition (SVD) and principal component analysis (PCA) to multi-tensor decompositions, neural networks, and deep learning.
- Applications toward better understanding of biology and practice of medicine, e.g., personalized cancer diagnostics, prognostics, and therapeutics.
- Proving mathematical theorems and programming symbolic computations.
- Designing algorithms and programming numerical computations.
- Working with databases and modeling biomedical data.
- In-class presentations of scientific journal articles and patents.
- Participation in guest lectures and seminars on campus and discussions of conference reports.
- End-of-class celebration.
- Syllabus
- COVID-19
- Spring 2023 Calendar
- Safety
- Health, Wellness, and Counseling
- Student Code

100% grade = 30% labs, 30% presentation, 30% class project, 10% class participation; late assignments are not accepted; class attendance is required.

Topics:

We will cover concepts in artificial intelligence, data science, and machine learning and their applications in the integration and comparison of different types of high-throughput omic and other data acquired by different technologies and from different studies toward discovery, verification, and validation of biomedical principles.

Skills:

Activities:

Readings on the SVD and deep learning:

- Book 1:

Book 2:

January 9:

- Welcome!

January 11:

- Introduction:

How Bright Promise in Cancer Testing Fell Apart,

- The SVD in the news:

If You Liked This, You're Sure to Love That,

- Timeline of the human genome project:

From Darwin and Mendel to the human genome project,

- Paper 1: The Alta Summit, December 1984, Cook-Deegan,

Paper 2: A Vision for the Future of Genomics Research, Collins et al.,

Acknowledgements.

January 16:

- Happy Martin Luther King Jr. Day!

January 18:

- Mathematics of the SVD:

- Notebook 1: Computation and Visualization of the SVD

Mathematica Code: Notebook_1.nb

- Lab 1:

Code the SVD or the pseudoinverse projection of synthetic data and its visualization. Test and debug your code.

January 23:

- Composition and decomposition of synthetic data:

- Notebook 2: The SVD of Synthetic Data

Mathematica Code: Notebook_2.nb

January 25:

- Mathematics of the pseudoinverse projection:

January 30:

- Slides 1: Examples of the SVD of measured data

- More examples of the SVD of measured data:

Paper 3: Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling, Alter et al.,

Patent 1: Method for Node Ranking in a Linked Database, Page,

Paper 4: A Rapid Genome-Scale Response of the Transcriptional Oscillator to Perturbation Reveals a Period-Doubling Path to Phenotypic Change, Li and Klevecz,

Paper 5: Coordinated Metabolic Transitions During

February 1:

- Computation of the pseudoinverse projection:

- Slides 2: Data integration by using the pesudoinverse projection

- Notebook 3: The Pseudoinverse Projection of Measured Data

Mathematica Code: Notebook_3.nb

February 6:

- Examples of the pseudoinverse projection of measured data:

Paper 6: Integrative Analysis of Genome-Scale Data by Using Pseudoinverse Projection Predicts Novel Correlation between DNA Replication and RNA Transcription, Alter and Golub,

Paper 7: Reconstructing the Pathways of a Cellular System from Genome-Scale Signals by Using Matrix and Tensor Computations, Alter and Golub,

Paper 8: Distinct Physiological States of

Paper 9: Using Pre-Existing Microarray Datasets to Increase Experimental Power: Application to Insulin Resistance, Daigle et al.,

February 8:

- In-Class Work on Lab 1

February 13:

- Happy Darwin Day!

- Example of TCGA data:

Paper 10: Comprehensive Genomic Characterization Defines Human Glioblastoma Genes and Core Pathways, TCGA Research Network,

February 15, 10:45–11:45am, WEB 3780:

February 15:

- Lab 1 "Data Clinic"

February 20:

- Happy Presidents Day!

February 22:

- Examples of interpretation of TCGA data:

Paper 11: Mathematically Universal and Biologically Consistent Astrocytoma Genotype Encodes for Transformation and Predicts Survival Phenotype, Aiello et al.,

Supplementary Material.

Paper 12: GSVD Comparison of Patient-Matched Normal and Tumor aCGH Profiles Reveals Global Copy-Number Alterations Predicting Glioblastoma Multiforme Survival, Lee et al.,

- Examples of assessing the statistical significance of an interpretation:

Paper 13: Systematic Determination of Genetic Network Architecture, Tavazoie et al.,

Paper 14: Discovering Motifs in Ranked Lists of DNA Sequences, Eden et al.,

Paper 15: GOrilla: A Tool for Discovery and Visualization of Enriched GO Terms in Ranked Gene Lists, Eden et al.,

February 27:

- Lab 1 Due In-Class

- In-Class Work on Lab 2:

Compute and visualize the SVD or the pseudoinverse projection of your data. Interpret your data based upon its SVD. Use at least two different approaches each for preprocessing and sorting your data and for assessing the statistical significance of your interpretation.

- Slides 3: The hypergeometric probability distribution and

- Notebook 4: The Hypergeometric Probability Distribution and

Mathematica Code: Notebook_4.nb

February 29:

- Gene H. Golub's Birthday!

Paper 16: Calculating the Singular Values and Pseudo-Inverse of a Matrix, Golub and Kahan,

March 1:

- Lab 2 "Data Clinic"

March 6:

- Happy Spring Break!

March 8:

- Happy Spring Break!

March 13:

- Slides 4: The SVD as a Transform

- Quantum Harmonic Oscillator from Wikipedia

- Image Compression via the SVD from Mathworld

- Image Compression via the Fourier Transform from Mathworld

March 15:

- From the SVD to PCA:

- Slides 5: The SVD vs. PCA

- Paper 17: Correspondence Analysis Applied to Microarray Data, Fellenberg et al.,

March 20:

- Lab 2 "Data Clinic"

- Selection of a cutoff of the singular values:

Paper 18: Component Retention in Principal Component Analysis with Application to cDNA Microarray Data, Cangelosi and Goriely,

Paper 19: The Optimal Hard Threshold for Singular Values is 4/√3, Gavish and Donoho,

- Robust PCA and removal of outliers:

Paper 20: Sparsity Control for Robust Principal Component Analysis, Mateos and Giannakis, in

Paper 21: Robust Principal Component Analysis? Candès et al.,

March 22:

- Mathematical variations on the SVD and PCA for blind source separation (BSS):

- Independent component analysis (ICA):

Paper 22: Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images, Olshausen and Field,

Paper 23: The "Independent Components" of Natural Scenes are Edge Filters, Bell and Sejnowski,

Paper 24: Linear Modes of Gene Expression Determined by Independent Component Analysis, Liebermeister,

- Nonnegative matrix factorization (NMF):

Paper 25: Learning the Parts of Objects by Non-Negative Matrix Factorization, Lee and Seung,

Paper 26: Metagenes and Molecular Pattern Discovery Using Matrix Factorization, Brunet et al.,

March 27:

- Mathematics of the generalized SVD (GSVD):

- Paper 27: The GSVD: Where are the Ellipses?, Matrix Trigonometry, and More, Edelman and Wang,

March 29:

- Computation of the GSVD:

Notebook 5: The GSVD of Synthetic Data

Mathematica Code: Notebook_5.nb

April 1, 11:50am–1:10pm:

- Lab 2 "Data Clinic"

April 3:

- Slides 6: Examples of the GSVD of measured data

April 5:

- Lab 2 Due In-Class

- More examples of the GSVD of measured data:

Paper 28: Generalized Singular Value Decomposition for Comparative Analysis of Genome-Scale Expression Datasets of Two Different Organisms, Alter et al.,

Paper 29: Combining Transcriptional Datasets Using the Generalized Singular Value Decomposition, by Schreiber et al.,

Paper 30: Exploring Metabolic Pathway Disruption in the Subchronic Phencyclidine Model of Schizophrenia with the Generalized Singular Value Decomposition, by Xiao et al.,

April 10:

- Examples of canonical correlation analysis (CCA) of measured data:

Paper 31: A New Muscle Artifact Removal Technique to Improve the Interpretation of the Ictal Scalp Electroencephalogram, by De Clercq et al.,

Paper 32: Integrating Multiple-Study Multiple-Subject fMRI Datasets Using Canonical Correlation Analysis, Rustandi et al., in

- Examples of generalizations of the GSVD of measured data:

Paper 33: A Higher-Order Generalized Singular Value Decomposition for Comparison of Global mRNA Expression from Multiple Organisms, Ponnapalli et al.,

Paper 34: Multi-Tissue Analysis of Co-expression Networks by Higher-Order Generalized Singular Value Decomposition Identifies Functionally Coherent Transcriptional Modules, by Xiao et al.,

Paper 35: Tensor GSVD of Patient- and Platform-Matched Tumor and Normal DNA Copy-Number Profiles Uncovers Chromosome Arm-Wide Patterns of Tumor-Exclusive Platform-Consistent Alterations Encoding for Cell Transformation and Predicting Ovarian Cancer Survival, Sankaranarayanan et al.,

Paper 36: TNF-Insulin Crosstalk at the Transcription Factor GATA6 is Revealed by a Model that Links Signaling and Transcriptomic Data Tensors, by Chitforoushzadeh et al.,

April 12:

- Example of verification:

Paper 37: Global Effects of DNA Replication and DNA Replication Origin Activity on Eukaryotic Gene Expression, Omberg et al.,

April 17:

- "State of the project" presentations

April 19:

- "State of the project" presentations

April 21, 11:45am–1:15pm, SMBB 2650:

April 24:

- End-of-class celebration!

- Example of validation:

Paper 38: Retrospective Clinical Trial Experimentally Validates Glioblastoma Genome-Wide Pattern of DNA Copy-Number Alterations Predictor of Survival, Ponnapalli et al.,

Happy Summer Break!

- DNA,

See you in Fall 2024 in BME 6780: Data Science for Bioengineers!