- For postdoctoral, graduate, and advanced undergraduate students in Engineering, Sciences, and Medicine, and professionals in industry.
- Fall 2020, Mondays and Wednesdays 11:50am–1:10pm, Zoom. Prerequisites: Some experience programming and instructor approval.
- Databases, from the Cancer Genome Atlas (TCGA) at the Genomic Data Commons (GDC) to the Surveillance, Epidemiology, and End Results (SEER) database.
- Data types, from omics, imaging, and patient clinical information to biomedical samples and model organisms and systems.
- Algorithms, from the singular value decomposition (SVD) and principal component analysis (PCA) to multi-tensor decompositions, neural networks, and deep learning.
- Applications, from the Luria-Delbrück experiment to personalized cancer diagnostics, prognostics, and therapeutics.
- Proving mathematical theorems and programming symbolic computations.
- Designing algorithms and programming numerical computations.
- Working with databases and modeling biomedical data.
- In-class presentations of scientific journal articles and patents.
- Participation in guest lectures and seminars on campus and discussions of conference reports.
- End-of-class celebration.
- Syllabus
- COVID-19
- Fall 2021 Calendar
- Safety
- Health, Wellness, and Counseling
- Student Code

100% grade = 30% labs, 30% presentation, 30% class project, 10% class participation; class attendance is required; late assignments are not accepted.

Topics:

We will cover concepts in data science and machine learning, and their applications to discovery of principles from biomedical data.

Skills:

Activities:

August 23:

- Welcome!

- How Bright Promise in Cancer Testing Fell Apart,

- The SVD in the news:

If You Liked This, You're Sure to Love That,

- PCA for face recognition:

Paper 1: Low-Dimensional Procedure for the Characterization of Human Faces, Sirovich and Kirby,

Paper 2: Eigenfaces for Recognition, Turk and Pentland,

- Mathematics of the SVD:

- Notebook 1: Computation and Visualization of the SVD

Mathematica Code: Notebook_1.nb

February 29:

- Gene H. Golub's Birthday!

Paper 3: Calculating the Singular Values and Pseudo-Inverse of a Matrix, Golub and Kahan,

August 25:

- Mathematical properties of the SVD

- Lab 1:

Code the SVD or the tensor SVD of synthetic data and its visualization. Test and debug your code.

August 29:

- SVD of synthetic data:

- Composition and decomposition of synthetic data:

- Notebook 2: SVD of Synthetic Data

Mathematica Code: Notebook_2.nb

August 31:

- Computation of the higher-order SVD (HOSVD), a tensor SVD:

- Tensor SVD of synthetic data:

- Notebook 3: Tensor SVD of Synthetic Data

Mathematica Code: Notebook_3.nb

September 6:

- Happy Labor Day!

September 8:

- In-Class Work on Lab 1

September 13, 10:30am:

- Revisiting the mathematical properties of the SVD

September 13:

- Slides 1: Examples of SVD of measured data

- More examples of SVD of measured data:

Paper 4: Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling, Alter et al.,

Patent 1: Method for Node Ranking in a Linked Database, Page,

Paper 5: A Rapid Genome-Scale Response of the Transcriptional Oscillator to Perturbation Reveals a Period-Doubling Path to Phenotypic Change, Li and Klevecz,

Paper 6: Coordinated Metabolic Transitions During

September 15:

- Slides 2: Examples of the HOSVD of measured data

September 20:

- Example of TCGA data:

Paper 7: Comprehensive, Integrative Genomic Analysis of Diffuse Lower-Grade Gliomas, TCGA Research Network,

September 22:

- Example of interpretation of TCGA data:

Paper 8: Mathematically Universal and Biologically Consistent Astrocytoma Genotype Encodes for Transformation and Predicts Survival Phenotype, Aiello et al.,

September 27:

- Lab 1 Due In-Class

- Lab 2:

Compute and visualize the SVD or tensor SVD of your data. Interpret your data based upon its SVD or tensor SVD. Use at least two different approaches each for preprocessing and sorting your data and for assessing the statistical significance of your interpretation.

- From the SVD to PCA:

- Slides 3: The SVD vs. PCA

Paper 21: Correspondence Analysis Applied to Microarray Data, Fellenberg et al.,

September 29:

- Examples of enrichment analyses:

Paper 8: Systematic Determination of Genetic Network Architecture, Tavazoie et al.,

Paper 9: GOrilla: A Tool for Discovery and Visualization of Enriched GO Terms in Ranked Gene Lists, Eden et al.,

- Notebook 4: The Hypergeometric Probability Distribution and P-Value

Mathematica Code: Notebook_4.nb

October 6:

- Lab 2 "Data Clinic"

- Selection of a cutoff of the singular values:

Paper 17: Component Retention in Principal Component Analysis with Application to cDNA Microarray Data, Cangelosi and Goriely,

Paper 18: The Optimal Hard Threshold for Singular Values is 4/√3, Gavish and Donoho,

- Robust PCA and removal of outliers:

Paper 19: Sparsity Control for Robust Principal Component Analysis, Mateos and Giannakis,

Paper 20: Robust Principal Component Analysis? Candès et al.,

October 8:

- Mathematics of a tensor SVD, the higher-order SVD (HOSVD):

Paper 11: A Multilinear Singular Value Decomposition, De Lathauwer et al.,

October 11:

- Happy Fall Break!

October 13:

- Happy Fall Break!

October 18:

- More examples of HOSVD of measured data:

Paper 12: A Tensor Higher-Order Singular Value Decomposition for Integrative Analysis of DNA Microarray Data from Different Studies, Omberg et al.,

Paper 13: Characterizing the Evolution of Genetic Variance Using Genetic Covariance Tensors, Hines et al.,

Paper 14: Integrative Analysis of Many Weighted Co-Expression Networks Using Tensor Computation, Li et al.,

Paper 15: MultiFacTV: Module Detection from Higher-Order Time Series Biological Data, Li et al.,

Paper 16: Subgraph Augmented Nonnegative Tensor Factorization (SANTF) for Modeling Clinical Narrative Text, Luo et al.,

October 25:

- Lab 2 "Data Clinic"

October 27:

- Slides 4: The SVD as a Transform

- Quantum Harmonic Oscillator from Wikipedia

- Image Compression via the SVD from Mathworld

- Image Compression via the Fourier Transform from Mathworld

- A Hard Day's Night Opening Chord from Wikipedia

November 1:

- Mathematical variations on the SVD and PCA:

- Independent component analysis (ICA):

Paper 22: Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images, Olshausen and Field,

Paper 23: The "Independent Components" of Natural Scenes are Edge Filters, Bell and Sejnowski,

Paper 24: Linear Modes of Gene Expression Determined by Independent Component Analysis, Liebermeister,

- Nonnegative matrix factorization (NMF):

Paper 25: Learning the Parts of Objects by Non-Negative Matrix Factorization, Lee and Seung,

Paper 26: Metagenes and Molecular Pattern Discovery Using Matrix Factorization, Brunet et al.,

November 3:

- Tensor SVD of measured data:

November 8:

- Notebook 5: Tensor SVD of Measured Data

Mathematica Code: Notebook_5.nb

November 10:

- Kaplan-Meier survival analysis

November 15:

- Lab 2 Due In-Class

- The "perceptron," i.e., single-layer neural network, as a mathematical variation on the SVD:

November 17:

- In-Class Work on Class Project

November 22:

- Neural networks:

Paper 27: Predicting Human Brain Activity Associated with the Meanings of Nouns, Mitchell et al.,

Paper 28: Integrating Multiple-Study Multiple-Subject fMRI Datasets Using Canonical Correlation Analysis, Rustandi et al.,

- Readings on SVD and deep learning:

November 24:

- Verification and validation

- Example of verification:

Paper 29: Global Effects of DNA Replication and DNA Replication Origin Activity on Eukaryotic Gene Expression, Omberg et al.,

- Example of validation:

Paper 30: Retrospective Clinical Trial Experimentally Validates Glioblastoma Genome-Wide Pattern of DNA Copy-Number Alterations Predictor of Survival, Ponnapalli et al.,

November 25:

- Happy Thanksgiving!

November 29:

- "State of the project" presentations

December 1:

- "State of the project" presentations

December 6:

- "State of the project" presentations

December 8:

- "State of the project" presentations

- End-of-class celebration!

Happy Winter Break!

- DNA from xkcd

See you in Spring 2021 in BME 6770: Genomic Signal Processing