- For postdoctoral, graduate, and advanced undergraduate students in Engineering, Sciences, and Medicine, and professionals in industry.
- Fall 2023, Mondays and Wednesdays 11:50am–1:10pm, LCB 115 and Zoom; office hours by request or Wednesdays 11:00am, WEB 3803. Prerequisites: Some experience programming and instructor approval.
- Databases, e.g., the Cancer Genome Atlas (TCGA) at the Genomic Data Commons (GDC).
- Data types, from, e.g., omics, imaging, and patient clinical information to, e.g., tissue samples and model organisms and systems.
- Algorithms, from the singular value decomposition (SVD) and principal component analysis (PCA) to multi-tensor decompositions, neural networks, and deep learning.
- Applications toward a better understanding of biology, e.g., the Luria-Delbrück experiment, and a better practice of medicine, e.g., personalized cancer diagnostics, prognostics, and therapeutics.
- Proving mathematical theorems and programming symbolic computations.
- Designing algorithms and programming numerical computations.
- Working with databases and modeling biomedical data.
- In-class presentations of scientific journal articles and patents.
- Participation in guest lectures and seminars on campus and discussions of conference reports.
- End-of-class celebration.
- Syllabus
- COVID-19
- Fall 2023 Calendar
- Safety
- Health, Wellness, and Counseling
- Student Code

100% grade = 30% labs, 30% class project, 30% presentation, 10% class participation; late assignments are not accepted; class attendance is required.

Topics:

We will cover concepts in data science and machine learning, and their applications to discovery of principles from biomedical data.

Skills:

Activities:

Readings on the SVD and deep learning:

- Book 1:

Book 2:

August 21:

- Welcome!

August 23:

- Introduction:

How Bright Promise in Cancer Testing Fell Apart,

- The SVD in the news:

If You Liked This, You're Sure to Love That,

- PCA for face recognition:

Paper 1: Low-Dimensional Procedure for the Characterization of Human Faces, Sirovich and Kirby,

Paper 2: Eigenfaces for Recognition, Turk and Pentland,

- Lab 1:

Code the SVD or the tensor SVD of synthetic data and its visualization. Test and debug your code.

August 28:

- Mathematics of the SVD:

- Notebook 1: Computation and Visualization of the SVD

Mathematica Code: Notebook_1.nb

February 29:

- Gene H. Golub's Birthday!

Paper 3: Calculating the Singular Values and Pseudo-Inverse of a Matrix, Golub and Kahan,

August 30:

- Composition and decomposition of synthetic data:

- Notebook 2: The SVD of Synthetic Data

Mathematica Code: Notebook_2.nb

September 4:

- Happy Labor Day!

September 6:

- Testing and debugging your SVD code

September 11:

- Mathematics of a tensor SVD, the higher-order SVD (HOSVD):

- Paper 4: A Multilinear Singular Value Decomposition, De Lathauwer et al.,

September 13:

- Computation of the HOSVD:

- Notebook 3: The tensor SVD of Synthetic Data

Mathematica Code: Notebook_3.nb

September 18:

- Slides 1: Examples of the SVD of measured data

- More examples of the SVD of measured data:

Paper 5: Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling, Alter et al.,

Patent 1: Method for Node Ranking in a Linked Database, Page,

Paper 6: A Rapid Genome-Scale Response of the Transcriptional Oscillator to Perturbation Reveals a Period-Doubling Path to Phenotypic Change, Li and Klevecz,

Paper 7: Coordinated Metabolic Transitions During

September 20:

- In-Class Work on Lab 1

September 25:

- Example of TCGA data:

Paper 8: Comprehensive, Integrative Genomic Analysis of Diffuse Lower-Grade Gliomas, TCGA Research Network,

- Example of interpretation of TCGA data:

Paper 9: Mathematically Universal and Biologically Consistent Astrocytoma Genotype Encodes for Transformation and Predicts Survival Phenotype, Aiello et al.,

September 27:

- Lab 1 Due In-Class

- Lab 2:

Compute and visualize the SVD or the tensor SVD of your data. Interpret your data based upon its SVD or its tensor SVD. Use at least two different approaches each for preprocessing and sorting your data and for assessing the statistical significance of your interpretation.

- From the SVD to PCA:

- Slides 2: The SVD vs. PCA

- Paper 10: Correspondence Analysis Applied to Microarray Data, Fellenberg et al.,

October 2:

- The SVD is used for the stable computation of PCA:

- Paper 11: Serum Proteomics Profiling — a Young Technology Begins to Mature, Coombes et al.,

October 4:

- Examples of assessing the statistical significance of an interpretation:

Paper 12: Systematic Determination of Genetic Network Architecture, Tavazoie et al.,

Paper 13: Discovering Motifs in Ranked Lists of DNA Sequences, Eden et al.,

Paper 14: GOrilla: A Tool for Discovery and Visualization of Enriched GO Terms in Ranked Gene Lists, Eden et al.,

- Slides 3: The hypergeometric probability distribution and

- Notebook 4: The Hypergeometric Probability Distribution and

Mathematica Code: Notebook_4.nb

October 9:

- Happy Fall Break!

October 11:

- Happy Fall Break!

October 16:

- Selection of a cutoff of the singular values:

Paper 15: Component Retention in Principal Component Analysis with Application to cDNA Microarray Data, Cangelosi and Goriely,

Paper 16: The Optimal Hard Threshold for Singular Values is 4/√3, Gavish and Donoho,

- Robust PCA and removal of outliers:

Paper 17: Sparsity Control for Robust Principal Component Analysis, Mateos and Giannakis, in

Paper 18: Robust Principal Component Analysis? Candès et al.,

October 18:

- In-Class Work on Lab 2

October 23:

- Lab 2 "Data Clinic"

October 25:

- Slides 5: The SVD as a Transform

- Quantum Harmonic Oscillator from Wikipedia

- Image Compression via the SVD from Mathworld

- Image Compression via the Fourier Transform from Mathworld

October 30:

- Mathematical variations on the SVD and PCA for blind source separation (BSS):

- Independent component analysis (ICA):

Paper 19: Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images, Olshausen and Field,

Paper 20: The "Independent Components" of Natural Scenes are Edge Filters, Bell and Sejnowski,

Paper 21: Linear Modes of Gene Expression Determined by Independent Component Analysis, Liebermeister,

- Nonnegative matrix factorization (NMF):

Paper 22: Learning the Parts of Objects by Non-Negative Matrix Factorization, Lee and Seung,

Paper 23: Metagenes and Molecular Pattern Discovery Using Matrix Factorization, Brunet et al.,

November 1:

- Notebook 5: The tensor SVD of Measured Data

Mathematica Code: Notebook_5.nb

November 6:

- In-Class Work on Lab 2

November 8:

- In-Class Work on Lab 2

November 11:

November 13:

- In-Class Work on Lab 2

November 15:

- Slides 4: Examples of the HOSVD of measured data

- More examples of the HOSVD of measured data:

Paper 24: A Tensor Higher-Order Singular Value Decomposition for Integrative Analysis of DNA Microarray Data from Different Studies, Omberg et al.,

Paper 25: Characterizing the Evolution of Genetic Variance Using Genetic Covariance Tensors, Hines et al.,

Paper 26: Integrative Analysis of Many Weighted Co-Expression Networks Using Tensor Computation, Li et al.,

Paper 27: MultiFacTV: Module Detection from Higher-Order Time Series Biological Data, Li et al.,

Paper 28: Subgraph Augmented Nonnegative Tensor Factorization (SANTF) for Modeling Clinical Narrative Text, Luo et al.,

November 20:

- Lab 2 Due In-Class

November 22:

- Verification and validation

- Example of verification:

Paper 29: Global Effects of DNA Replication and DNA Replication Origin Activity on Eukaryotic Gene Expression, Omberg et al.,

- Example of validation:

Paper 30: Retrospective Clinical Trial Experimentally Validates Glioblastoma Genome-Wide Pattern of DNA Copy-Number Alterations Predictor of Survival, Ponnapalli et al.,

November 23:

- Happy Thanksgiving!

November 27:

- The "perceptron," i.e., single-layer neural network, as a mathematical variation on the SVD:

- Neural networks:

Paper 31: Predicting Human Brain Activity Associated with the Meanings of Nouns, Mitchell et al.,

Paper 32: Integrating Multiple-Study Multiple-Subject fMRI Datasets Using Canonical Correlation Analysis, Rustandi et al., in

November 29:

- "State of the project" presentations

December 4:

- "State of the project" presentations

December 6:

- "State of the project" presentations

- End-of-class celebration!

Happy Winter Break!

- See you in Spring 2024 in BME 6770: Genomic Signal Processing!

May be pivotal to your career:

- Helping AWS Customers Accelerate Success via Machine Learning,