BIOEN 6900-003: Data Science for Bioengineers
- For postdoctoral, graduate, and advanced undergraduate students in Engineering, Sciences, and Medicine, and professionals in industry.
- Fall 2019, Mondays and Wednesdays 11:50am–1:10pm, LCB 115.
Prerequisites: Some experience programming and instructor approval.
100% grade = 30% labs, 30% presentation, 30% class project, 10% class participation; class attendance is required; late assignments are not accepted.
We will cover concepts in data science and machine learning, and their applications to discovery of principles from biomedical data.
- Databases, from the Cancer Genome Atlas (TCGA) at the Genomic Data Commons (GDC) to the Utah Population Database (UPDB).
- Data types, from omics, imaging, and patient clinical information to biomedical samples and model organisms and systems.
- Algorithms, from the singular value decomposition (SVD) and principal component analysis (PCA) to multi-tensor decompositions, neural networks, and deep learning.
- Applications, from the Luria-Delbrück experiment to personalized cancer diagnostics, prognostics, and therapeutics.
- Proving mathematical theorems and programming symbolic computations.
- Designing algorithms and programming numerical computations.
- Working with databases and modeling biomedical data.
- In-class presentations of scientific journal articles and patents.
- Participation in guest lectures and seminars on campus and discussions of conference reports.
- End-of-class celebration.
- Fall 2019 Calendar
- Health, Wellness, and Counseling
- Student Code
Numerical Linear Algebra, Trefethen and Bau, III (1997).
Mathematical properties of the SVD
In-Class Work on Lab 1:
Code the SVD of synthetic data and its visualization. Test and debug your code.
August 29, Thursday, 10:00–11:00am, WEB 3780, in lieu of any one Lab:
Composition and decomposition of synthetic data:
Los Alamos National Laboratory, Sandia National Laboratories, NSF, and University of California San Diego Workshop on Artificial Intelligence and Tensor Factorizations for Physical, Chemical, and Biological Systems (Santa Fe, NM, September 17–20, 2019).
National Cancer Institute (NCI) Physical Sciences in Oncology Symposium (Minneapolis, MN, September 18–20, 2019).
In-Class Work on Lab 2:
Compute and visualize the SVD of your data. Test and debug your code. Interpret your data based upon its SVD. Use at least two different approaches each for preprocessing and sorting your data and for assessing the statistical significance of your interpretation.
October 7 and 9:
More examples of HOSVD of measured data:
Paper 8: A Tensor Higher-Order Singular Value Decomposition for Integrative Analysis of DNA Microarray Data from Different Studies, Omberg et al., Proc Natl Acad Sci USA (2007).
Paper 9: Characterizing the Evolution of Genetic Variance Using Genetic Covariance Tensors, Hines et al., Philos Trans R Soc Lond B Biol Sci (2009).
Paper 10: Integrative Analysis of Many Weighted Co-Expression Networks Using Tensor Computation, Li et al., PLoS Comp Bio (2011).
Paper 11: MultiFacTV: Module Detection from Higher-Order Time Series Biological Data, Li et al., BMC Genomics (2013).
Paper 12: Subgraph Augmented Nonnegative Tensor Factorization (SANTF) for Modeling Clinical Narrative Text, Luo et al., J Am Med Inform Assoc (2015).
Computation of the HOSVD:
"State of the project" presentations
"State of the project" presentations
Tensor SVD of measured data:
Tensor SVD of synthetic data:
From the SVD to PCA:
Mathematical variations on the SVD and PCA:
The "perceptron," i.e., single-layer neural network, as a mathematical variation on the SVD:
Project update presentations
Happy Winter Break!