



BME 6780: Data Science for Bioengineers
- For postdoctoral, graduate, and advanced undergraduate students in Engineering, Sciences, and Medicine, and professionals in industry.
- Fall 2022, Mondays and Wednesdays 11:50am–1:10pm, Zoom.
Prerequisites: Some experience programming and instructor approval.
100% grade = 30% labs, 30% class project, 30% presentation, 10% class participation; late assignments are not accepted; class attendance is required.
Topics:
We will cover concepts in data science and machine learning, and their applications to discovery of principles from biomedical data.
- Databases, e.g., the Cancer Genome Atlas (TCGA) at the Genomic Data Commons (GDC).
- Data types, from, e.g., omics, imaging, and patient clinical information to, e.g., tissue samples and model organisms and systems.
- Algorithms, from the singular value decomposition (SVD) and principal component analysis (PCA) to multi-tensor decompositions, neural networks, and deep learning.
- Applications toward a better understanding of biology, e.g., the Luria-Delbrück experiment, and a better practice of medicine, e.g., personalized cancer diagnostics, prognostics, and therapeutics.
Skills:
- Proving mathematical theorems and programming symbolic computations.
- Designing algorithms and programming numerical computations.
- Working with databases and modeling biomedical data.
Activities:
- In-class presentations of scientific journal articles and patents.
- Participation in guest lectures and seminars on campus and discussions of conference reports.
- End-of-class celebration.
Readings on the SVD and deep learning:
- Syllabus
- COVID-19
- Fall 2022 Calendar
- Safety
- Health, Wellness, and Counseling
- Student Code
August 22:
August 24:
Lab 1:
Code the SVD or the tensor SVD of synthetic data and its visualization. Test and debug your code.
August 29:
Numerical Linear Algebra, Trefethen and Bau, III (1997).
February 29:
Matrix Computations, Golub and Van Loan (1996).
August 31:
Composition and decomposition of synthetic data:


September 5:
September 7:
Mathematics of a tensor SVD, the higher-order SVD (HOSVD):

September 12:
Computation of the HOSVD:


September 14:
More examples of the SVD of measured data:
Paper 5: Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling, Alter et al., Proceedings of the National Academy of Sciences (PNAS) USA (2000).
Patent 1: Method for Node Ranking in a Linked Database, Page, United States Patent (2001).
Paper 6: A Rapid Genome-Scale Response of the Transcriptional Oscillator to Perturbation Reveals a Period-Doubling Path to Phenotypic Change, Li and Klevecz, Proceedings of the National Academy of Sciences (PNAS) USA (2006).
Paper 7: Coordinated Metabolic Transitions During Drosophila Embryogenesis and the Onset of Aerobic Glycolysis, Tennessen, Bertagnolli et al., G3: Genes, Genomes, Genetics (2014).
September 19:
September 21:
September 26:
September 28:
In-Class Work on Lab 2:
Compute and visualize the SVD or the tensor SVD of your data. Interpret your data based upon its SVD or its tensor SVD. Use at least two different approaches each for preprocessing and sorting your data and for assessing the statistical significance of your interpretation.
From the SVD to PCA:

October 3:
October 5:
October 10:
October 12:
October 17:
More examples of the HOSVD of measured data:
Paper 18: A Tensor Higher-Order Singular Value Decomposition for Integrative Analysis of DNA Microarray Data from Different Studies, Omberg et al., Proceedings of the National Academy of Sciences (PNAS) USA (2007).
Paper 19: Characterizing the Evolution of Genetic Variance Using Genetic Covariance Tensors, Hines et al., Philosophical Transactions of the Royal Society B Biological Sciences (2009).
Paper 20: Integrative Analysis of Many Weighted Co-Expression Networks Using Tensor Computation, Li et al., Public Library of Science (PLoS) Computational Biology (2011).
Paper 21: MultiFacTV: Module Detection from Higher-Order Time Series Biological Data, Li et al., BMC Genomics (2013).
Paper 22: Subgraph Augmented Nonnegative Tensor Factorization (SANTF) for Modeling Clinical Narrative Text, Luo et al., Journal of the American Medical Informatics Association (2015).
October 19:
October 24:
October 26:
Mathematical variations on the SVD and PCA for blind source separation (BSS):

October 31:
The tensor SVD of measured data:

November 2:
November 7:
Kaplan-Meier survival analysis
November 9:
The "perceptron," i.e., single-layer neural network, as a mathematical variation on the SVD:

November 14:
In-Class Work on Class Project
November 16:
November 21:
Verification and validation
November 24:
November 28:
"State of the project" presentations
November 30:
"State of the project" presentations
December 5:
"State of the project" presentations
December 7:
"State of the project" presentations
End-of-class celebration!
Happy Winter Break!