- For postdoctoral, graduate, and advanced undergraduate students in Engineering, Sciences, and Medicine, and professionals in industry.
- Spring 2020, Mondays and Wednesdays 11:50am–1:10pm, LCB 115 and Webex.
- Technologies, for for high-throughput acquisition of different types of molecular biological data, e.g., omics, imaging, and patient clinical information.
- Databases, from the Cancer Genome Atlas (TCGA) at the Genomic Data Commons (GDC) to the Cancer Image Archive (TCIA).
- Mathematical frameworks, from the singular value decomposition (SVD) and principal component analysis (PCA) to multi-matrix tensor decompositions, neural networks, and deep learning.
- Applications toward better understanding of biology and practice of medicine, e.g., personalized cancer diagnostics, prognostics, and therapeutics.
- Proving mathematical theorems and programming symbolic computations.
- Designing algorithms and programming numerical computations.
- Working with databases and modeling biomedical data.
- In-class presentations of scientific journal articles.
- Participation in guest lectures and seminars on campus and discussions of conference reports.
- End-of-class celebration.
- Syllabus
- Spring 2020 Calendar
- Safety
- Health, Wellness, and Counseling
- Student Code

Prerequisites: Some experience programming and instructor approval.

100% grade = 30% labs, 30% presentation, 30% class project, 10% class participation; class attendance is required.

Topics:

Concepts in artificial intelligence, data science, and machine learning and their applications in the integration and comparison of different types of high-throughput omic data acquired by different technologies and from different studies toward discovery, verification, and validation of biomedical principles.

Skills:

Activities:

January 6:

- Welcome!

- So much more to discover:

- Technologies and databases: On the Utah origin of the human genome project

The Alta Summit,

- Mathematical frameworks: The singular value decomposition (SVD) in the news

If You Liked This, You're Sure to Love That,

- Applications and a note on ethics: Personalized medicine

How Bright Promise in Cancer Testing Fell Apart,

January 8:

- Slides 1: Technologies: Examples of high-throughput biotechnologies

- Genomics after the human genome project:

Paper 1: A Vision for the Future of Genomics Research, Collins et al.,

Acknowledgements.

- Databases: TCGA

Paper 2: Comprehensive Genomic Characterization Defines Human Glioblastoma Genes and Core Pathways, TCGA Research Network,

January 13:

- Mathematical frameworks: The SVD

- Notebook 1: Computation and Visualization of the SVD

Mathematica Code: Notebook_1.nb

- Lab 1:

Code the SVD of synthetic data and its visualization. Test and debug your code.

January 15:

January 20:

- Happy Dr. Martin Luther King, Jr. Day!

"Injustice anywhere is a threat to justice everywhere."

January 22:

- Slides 2: From data organization to analysis and interpretation

- Paper 3: Cluster Analysis and Display of Genome-Wide Expression Patterns, Eisen et al.,

- In-Class Project 1: Derive the hypergeometric distribution from first combinatorics principles.

January 27:

- In-Class Project 2: Download two interrelated omic profiles from TCGA via GDC, e.g., (

January 30, Thursday, 10:00–11:00pm, in lieu of any one Lab:

- Amazon Web Services (AWS) Education Research Webinar

February 10:

- Mathematical properties of the SVD

February 12:

- Lab 1 Due In-Class

- Slides 3: Discovery of data patterns by using the SVD

- Paper 4: Molecular Characterisation of Soft Tissue Tumours: a Gene Expression Study, Nielsen et al.,

Supplement.

- Happy International Darwin Day!

Timeline of the human genome project:

From Darwin and Mendel to the human genome project,

February 17:

- Happy Presidents Day!

- Slides 4: SVD as a Transform

- Quantum Harmonic Oscillator from Wikipedia

- Image Compression via the SVD from Mathworld

- Image Compression via the Fourier Transform from Mathworld

- A Hard Day's Night Opening Chord from Wikipedia

- In-Class Work on Lab 2:

Compute and visualize the SVD of your data. Interpret your data based upon its SVD. Use at least two different approaches each for preprocessing and sorting your data and for assessing the statistical significance of your interpretation.

February 19:

- Examples of interpretation of TCGA data:

Paper 5: Mathematically Universal and Biologically Consistent Astrocytoma Genotype Encodes for Transformation and Predicts Survival Phenotype, Aiello et al.,

Paper 6: GSVD Comparison of Patient-Matched Normal and Tumor aCGH Profiles Reveals Global Copy-Number Alterations Predicting Glioblastoma Multiforme Survival, Lee et al.,

- Examples of assessing the statistical significance of an interpretation:

Paper 7: Systematic Determination of Genetic Network Architecture, Tavazoie et al.,

Paper 8: GOrilla: A Tool for Discovery and Visualization of Enriched GO Terms in Ranked Gene Lists, Eden et al.,

Paper 9: Discovering Motifs in Ranked Lists of DNA Sequences, Eden et al.,

February 21, Friday, 8:00–11:00am, Utah State Capitol Rotunda, 350 North State Street, Salt Lake City, in lieu of any one Lab:

- 2020 Utah American Cancer Society (ACS) Cancer Action Network (CAN) Day at the Capitol

February 24:

- Slides 5: The SVD vs. PCA

Paper 10: Correspondence Analysis Applied to Microarray Data, Fellenberg et al.,

February 29:

- Gene H. Golub's Birthday!

Paper 11: Calculating the Singular Values and Pseudo-Inverse of a Matrix, Golub and Kahan,

March 2:

- Selection of a cutoff of the singular values:

Paper 12: Component Retention in Principal Component Analysis with Application to cDNA Microarray Data, Cangelosi and Goriely,

Paper 13: The Optimal Hard Threshold for Singular Values is 4/√3, Gavish and Donoho,

March 4:

- Lab 2 "Data Clinic"

March 9 and 11:

- Happy Spring Break!

March 16:

- Class Project "Data Clinic"

March 23:

- "State of the project" presentations

March 25:

- "State of the project" presentations

March 30:

- "State of the project" presentations

April 1:

- Survival analyses

April 6:

- Mathematical variations on the SVD and PCA:

Independent component analysis (ICA):

Paper 14: Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images, Olshausen and Field,

Paper 15: Linear Modes of Gene Expression Determined by Independent Component Analysis, Liebermeister,

- Nonnegative matrix factorization (NMF):

Paper 16: Learning the Parts of Objects by Non-Negative Matrix Factorization, Lee and Seung,

Paper 17: Metagenes and Molecular Pattern Discovery Using Matrix Factorization, Brunet et al.,

April 8:

- Slides 6: Comparative Generalized SVD (GSVD)

- Computation of the GSVD:

Notebook 2: GSVD of Synthetic Data

Mathematica Code: Notebook_2.nb

April 15:

- Examples of GSVD of measured data:

Paper 18: Generalized Singular Value Decomposition for Comparative Analysis of Genome-Scale Expression Datasets of Two Different Organisms, Alter et al.,

Paper 19: Combining Transcriptional Datasets Using the Generalized Singular Value Decomposition, by Schreiber et al.,

Paper 20: Exploring Metabolic Pathway Disruption in the Subchronic Phencyclidine Model of Schizophrenia with the Generalized Singular Value Decomposition, by Xiao et al.,

April 21:

- Official end-of-class celebration!

Project update presentations

April 23:

- Examples of generalizations of the GSVD of measured data:

Paper 21: A Higher-Order Generalized Singular Value Decomposition for Comparison of Global mRNA Expression from Multiple Organisms, Ponnapalli et al.,

Paper 22: Multi-Tissue Analysis of Co-expression Networks by Higher-Order Generalized Singular Value Decomposition Identifies Functionally Coherent Transcriptional Modules, by Xiao et al.,

- Paper 23: Tensor GSVD of Patient- and Platform-Matched Tumor and Normal DNA Copy-Number Profiles Uncovers Chromosome Arm-Wide Patterns of Tumor-Exclusive Platform-Consistent Alterations Encoding for Cell Transformation and Predicting Ovarian Cancer Survival, Sankaranarayanan et al.,

Paper 24: TNF-Insulin Crosstalk at the Transcription Factor GATA6 is Revealed by a Model that Links Signaling and Transcriptomic Data Tensors, by Chitforoushzadeh et al.,

April 23:

- Slides 7: Data integration by using the pesudoinverse

Mathematics of the pseudoinverse

April 27:

- Computation of the pseudoinverse:

Notebook 3: Pseudoinverse Projection of Measured Data

Mathematica Code: Notebook_3.nb

April 29:

- Examples of the pseudoinverse projection of measured data:

Paper 25: Integrative Analysis of Genome-Scale Data by Using Pseudoinverse Projection Predicts Novel Correlation between DNA Replication and RNA Transcription, Alter and Golub,

Paper 26: Reconstructing the Pathways of a Cellular System from Genome-Scale Signals by Using Matrix and Tensor Computations, Alter and Golub,

Paper 27: Distinct Physiological States of

Paper 28: Using Pre-Existing Microarray Datasets to Increase Experimental Power: Application to Insulin Resistance, Daigle et al.,

May 6:

- Beyond discovery: verification and validation

- Slides 8: From correlation to causal coordination by using the pesudoinverse

May 11:

- e-Guest Lecture:

Careers in Data Science: Making the World a Better Place

Sri Priya Ponnapalli, Ph.D.

Principal Scientist and Senior Manager, Amazon AI (Palo Alto, CA),

Faculty, Rutgers Business School (Newark, NJ), and

CEO and Co-Founder, Eigengene, Inc. (Palo Alto, CA)

May 18:

- Examples of verification and validation

Paper 29: Retrospective Clinical Trial Experimentally Validates Glioblastoma Genome-Wide Pattern of DNA Copy-Number Alterations Predictor of Survival, Ponnapalli et al.,

Paper 30: Global Effects of DNA Replication and DNA Replication Origin Activity on Eukaryotic Gene Expression, Omberg et al.,

May 27:

- e-Guest Lecture:

Careers in Data Science: Making the World a Better Place

Joel R. Meyerson, Ph.D.

Assistant Professor, Physiology and Biophysics,

Weill Cornell Medical College (New York, NY)

Happy Summer Break!

- Stay safe and healthy!

See you in Fall 2020 in BIOEN 6900-003: Data Science for Bioengineers