- For postdoctoral, graduate, and advanced undergraduate students in Engineering, Sciences, and Medicine, and professionals in industry.
- Spring 2020, Mondays and Wednesdays 11:50am–1:10pm, LCB 115.
- Technologies, for for high-throughput acquisition of different types of molecular biological data, e.g., omics, imaging, and patient clinical information.
- Databases, from the Cancer Genome Atlas (TCGA) at the Genomic Data Commons (GDC) to the Cancer Image Archive (TCIA).
- Mathematical frameworks, from the singular value decomposition (SVD) and principal component analysis (PCA) to multi-matrix tensor decompositions, neural networks, and deep learning.
- Applications toward better understanding of biology and practice of medicine, e.g., personalized cancer diagnostics, prognostics, and therapeutics.
- Proving mathematical theorems and programming symbolic computations.
- Designing algorithms and programming numerical computations.
- Working with databases and modeling biomedical data.
- In-class presentations of scientific journal articles and patents.
- Participation in guest lectures and seminars on campus and discussions of conference reports.
- End-of-class celebration.
- Syllabus
- Spring 2020 Calendar
- Safety
- Health, Wellness, and Counseling
- Student Code

Prerequisites: Some experience programming and instructor approval.

100% grade = 30% labs, 30% presentation, 30% class project, 10% class participation; class attendance is required.

Topics:

Concepts in artificial intelligence, data science, and machine learning and their applications in the integration and comparison of different types of high-throughput omic data acquired by different technologies and from different studies toward discovery, verification, and validation of biomedical principles.

Skills:

Activities:

January 6:

- Welcome!

- So much more to discover:

Comparative Spectral Decompositions for Personalized Cancer Diagnostics, Prognostics, and Therapeutics, Alter,

- Technologies and databases: On the Utah origin of the human genome project

The Alta Summit,

- Mathematical frameworks: The singular value decomposition (SVD) in the news

If You Liked This, You're Sure to Love That,

- Applications and a note on ethics: Personalized medicine

How Bright Promise in Cancer Testing Fell Apart,

January 8:

- Slides 1: Technologies: Examples of high-throughput biotechnologies

- Genomics after the human genome project:

Paper 1: A Vision for the Future of Genomics Research, Collins et al.,

Acknowledgements.

- Databases: TCGA

Paper 2: Comprehensive Genomic Characterization Defines Human Glioblastoma Genes and Core Pathways, TCGA Research Network,

January 13:

- Mathematical frameworks: The SVD

- Notebook 1: Computation and Visualization of the SVD

Mathematica Code: Notebook_1.nb

- Lab 1:

Code the SVD of synthetic data and its visualization. Test and debug your code.

January 15:

January 20:

- Happy Dr. Martin Luther King, Jr. Day!

"Injustice anywhere is a threat to justice everywhere."

January 22:

- Slides 2: From data organization to analysis and interpretation

- Paper 3: Cluster Analysis and Display of Genome-Wide Expression Patterns, Eisen et al.,

- In-Class Project 1: Derive the hypergeometric distribution from first combinatorics principles.

January 27:

- In-Class Project 2: Download two interrelated omic profiles from TCGA via GDC, e.g., (

January 30, Thursday, 10:00–11:00pm, in lieu of any one Lab:

- Amazon Web Services (AWS) Education Research Webinar

February 10:

- Mathematical properties of the SVD

February 12:

- Lab 1 Due In-Class

- Slides 3: Discovery of data patterns by using the SVD

- Paper 4: Molecular Characterisation of Soft Tissue Tumours: a Gene Expression Study, Nielsen et al.,

Supplement.

- Happy International Darwin Day!

Timeline of the human genome project:

From Darwin and Mendel to the human genome project,

February 17:

- Happy Presidents Day!

- Slides 4: SVD as a Transform

- Quantum Harmonic Oscillator from Wikipedia

- Image Compression via the SVD from Mathworld

- Image Compression via the Fourier Transform from Mathworld

- A Hard Day's Night Opening Chord from Wikipedia

- In-Class Work on Lab 2:

Compute and visualize the SVD of your data. Interpret your data based upon its SVD. Use at least two different approaches each for preprocessing and sorting your data and for assessing the statistical significance of your interpretation.

February 19:

- Examples of interpretation of TCGA data:

Paper 5: Mathematically Universal and Biologically Consistent Astrocytoma Genotype Encodes for Transformation and Predicts Survival Phenotype, Aiello et al.,

Paper 6: GSVD Comparison of Patient-Matched Normal and Tumor aCGH Profiles Reveals Global Copy-Number Alterations Predicting Glioblastoma Multiforme Survival, Lee et al.,

- Examples of assessing the statistical significance of an interpretation:

Paper 7: Systematic Determination of Genetic Network Architecture, Tavazoie et al.,

Paper 8: GOrilla: A Tool for Discovery and Visualization of Enriched GO Terms in Ranked Gene Lists, Eden et al.,

Paper 9: Discovering Motifs in Ranked Lists of DNA Sequences, Eden et al.,

February 21, Friday, 8:00–11:00am, Utah State Capitol Rotunda, 350 North State Street, Salt Lake City, in lieu of any one Lab:

- 2020 Utah American Cancer Society (ACS) Cancer Action Network (CAN) Day at the Capitol

February 24:

- Slides 5: The SVD vs. PCA

Paper 10: Correspondence Analysis Applied to Microarray Data, Fellenberg et al.,

- Mathematical variations on the SVD and PCA:

Independent component analysis (ICA):

Paper 11: Linear Modes of Gene Expression Determined by Independent Component Analysis, Liebermeister,

- Nonnegative matrix factorization (NMF):

Paper 12: Metagenes and Molecular Pattern Discovery Using Matrix Factorization, Brunet et al.,

February 29:

- Gene H. Golub's Birthday!

Paper 13: Calculating the Singular Values and Pseudo-Inverse of a Matrix, Golub and Kahan,

March 2:

- Selection of a cutoff of the singular values:

Paper 14: Component Retention in Principal Component Analysis with Application to cDNA Microarray Data, Cangelosi and Goriely,

Paper 15: The Optimal Hard Threshold for Singular Values is 4/√3, Gavish and Donoho,

March 4:

- Lab 2 "Data Clinic"

March 9 and 11:

- Happy Spring Break!

March 16:

- Slides 6: Data integration by using the pesudoinverse

- Mathematics of the pseudoinverse:

March 18:

- "State of the Project" Presentations

March 23:

- Paper 16: Integrative Analysis of Genome-Scale Data by Using Pseudoinverse Projection Predicts Novel Correlation between DNA Replication and RNA Transcription, Alter and Golub,

Paper 17: Reconstructing the Pathways of a Cellular System from Genome-Scale Signals by Using Matrix and Tensor Computations, Alter and Golub,

Paper 18: Distinct Physiological States of

Paper 19: Using Pre-Existing Microarray Datasets to Increase Experimental Power: Application to Insulin Resistance, Daigle et al.,

March 25:

- Slides 7: From correlation to causal coordination by using the pesudoinverse

- Computation of the pseudoinverse:

Notebook 2: Pseudoinverse Projection of Measured Data

Mathematica Code: Notebook_2.nb

April 1:

- Slides 8: Comparative Generalized SVD (GSVD)

- Computation of the GSVD:

Notebook 3: GSVD of Synthetic Data

Mathematica Code: Notebook_3.nb

April 3:

- Paper 17: Generalized Singular Value Decomposition for Comparative Analysis of Genome-Scale Expression Datasets of Two Different Organisms, Alter et al.,

Paper 18: Combining Transcriptional Datasets Using the Generalized Singular Value Decomposition, by Schreiber et al.,

Paper 19: Exploring Metabolic Pathway Disruption in the Subchronic Phencyclidine Model of Schizophrenia with the Generalized Singular Value Decomposition, by Xiao et al.,

April 8:

- Paper 20: A Higher-Order Generalized Singular Value Decomposition for Comparison of Global mRNA Expression from Multiple Organisms, Ponnapalli et al.,

Paper 21: Multi-Tissue Analysis of Co-expression Networks by Higher-Order Generalized Singular Value Decomposition Identifies Functionally Coherent Transcriptional Modules, by Xiao et al.,

- Paper 22: Tensor GSVD of Patient- and Platform-Matched Tumor and Normal DNA Copy-Number Profiles Uncovers Chromosome Arm-Wide Patterns of Tumor-Exclusive Platform-Consistent Alterations Encoding for Cell Transformation and Predicting Ovarian Cancer Survival, Sankaranarayanan et al.,

Paper 23: TNF-Insulin Crosstalk at the Transcription Factor GATA6 is Revealed by a Model that Links Signaling and Transcriptomic Data Tensors, by Chitforoushzadeh et al.,

April 10 and 15:

- In-Class Work on Lab 3:

Select two or more datasets, and explain how you might compare or integrate these data by using, e.g., pseudoinverse projection or GSVD. Explain also and the mathematical variables, and if possible also the mathematical operations, operations of your integrative or comparative model might mean biologically.

April 16, Tuesday, 10:00am–12:00pm, WEB 3780:

- Class Project "Data Clinic"

Happy Summer Break!

- DNA from xkcd

See you in Fall 2019 in BIOEN 6900-003: Data Science for Bioengineers