1 Content


1.1 Schedule and due dates

1.2 Topical outline

1.2.1 Introduction, big picture, review, technical setup

  1. Inspiration, introduction, syllabus, etc
    Content/00-Inspiration.html

  2. The second scientific revolution (e-science)
    Content/01-eScience.html

  3. Tech setup, ipython, git-classes, code notebooks
    Content/02-PlatformTools.html

  4. Python review
    Content/03-PythonReview.html

  5. Genetics review
    Content/04-BioReview.html

  6. Bioinformatics basics
    Content/05-BioInfoBasics.html

1.2.1.1 Programming assignment 00 (pa00-platform)

Topic: Class virtual machine, jupyter, numpy, pandas, matplotlib.
Log into https://git-classes.mst.edu to see this and all future assignments.

1.2.2 Biological sequence processing

To get the relevant files for this section,
start up the class VM,
navigate to a directory you want to store the lecture notes in,
and then:
git clone https://gitlab.com/bio-data/sequence-informatics.git
As I add lectures, or improve old ones,
they’ll be updated in the repo,
so you can just do this is the repo to get the latest notebook scripts:
git pull

  1. Introduction and bioinformatics software (read as part of pa00-platform, not lecture)
    00-introduction.py, 01-biological-information.py

  2. Pairwise sequence alignment
    02-pairwise-alignment.py
    Recordings:
    Part 1: https://vimeo.com/674616839
    Part 2: https://vimeo.com/677007265
    Part 3: https://vimeo.com/677007892
    Part 4: https://vimeo.com/680727824
    pa01: https://vimeo.com/680728953

  3. Database searching and sequence homology
    03-database-searching.py
    Recordings:
    Part 1: https://vimeo.com/680730089
    Part 2: 2022-02-23 (distracted in class and forgot to click start recording…sorry)
    Part 3: 2022-02-25 https://vimeo.com/682719141
    Part 4: 2022-02-28 (distracted in class and forgot to click start recording…sorry)

  4. Multiple sequence alignment
    04-multiple-sequence-alignment.py
    Recordings:
    Part 1: 2022-03-02 https://vimeo.com/685957644
    Part 2: 2022-03-04 https://vimeo.com/685957698
    Part 3: 2022-03-07 https://vimeo.com/685957751
    Part 4: 2022-03-09 https://vimeo.com/687727294
    pa02: 2022-03-11 https://vimeo.com/687728220

  5. Phylogenetic reconstruction
    05-phylogeny-reconstruction.py
    Recordings:
    Part 1: 2022-03-14 https://vimeo.com/692413491
    Part 2: 2022-03-16 https://vimeo.com/692413573
    pa02 working day: 2022-03-21 (no recording)

  6. Sequence mapping and clustering, SOM
    06-sequence-mapping-and-clustering.py
    Recordings:
    Part 1: 2022-03-23 https://vimeo.com/692413680
    Part 2: 2022-03-25 https://vimeo.com/692413878
    Part 3: 2022-04-04 https://vimeo.com/696706898
    Part 4: 2022-04-06 https://vimeo.com/696707009

  7. Diversity informatics
    07-biological-diversity.py
    Recordings:
    Part 1: 2022-04-08 https://vimeo.com/700484079
    Part 2: 2022-04-11 https://vimeo.com/700484242

  8. Machine learning in bioinformatics
    08-machine-learning.py
    Skipped this one for pacing

A fun recent example of a large genomics project employing some neat machine learning analyses to the data:
https://zoonomiaproject.org/ 
A pop news article about it:
https://www.vice.com/en/article/4a3wwg/scientists-sequenced-dna-of-nearly-every-mammal-on-earth-in-unprecedented-project
One paper using theses methods to understand the genetics of brain size expansion throughout evolution:
https://www.science.org/doi/10.1126/science.abm7993

1.2.2.1 Programming assignments 01

Pairwise alignment and automated PCR primer selection

1.2.2.2 Programming assignments 02

Multiple sequence alignment (MSA) and OTU clustering

1.2.2.3 Programming assignments 03

K-means vs. self-organizing map (SOM) for cancer clustering and dimensionality reduction

1.2.3 Classification of bio-data

git clone https://gitlab.com/bio-data/machine-learning.git

  1. sklearn, Breast Cancer dataset intro
    Code: 00_BC_basics/
    Recording: 2022-04-15: https://vimeo.com/700484472
    Read more:
    http://www.scipy-lectures.org/packages/scikit-learn/index.html
    https://scikit-learn.org/stable/tutorial/index.html
    http://scikit-learn.org/stable/tutorial/basic/tutorial.html
    http://scikit-learn.org/stable/tutorial/statistical_inference/index.html
    https://scikit-learn.org/stable/user_guide.html

  2. Intro to supervised learning, k-NN on pre-extracted Breast Cancer image features
    Code: 01_KNN/
    Recordings:
    Part 1: 2022-04-18 https://vimeo.com/700484612
    Part 2: 2022-04-20 (below)

  3. Regression, Bayes models, and Decision trees on pre-extracted Breast Cancer image features
    Code: 02_LogisticRegression/
    Recordings: 2022-04-20

  4. Random forest, Neural networks, SVM on pre-extracted Breast Cancer image features
    Content/17-NeuralNetworks.html
    [ ] Zim this: Content/spiking_pres_aggregate.pdf
    Code:
    03_DecisionTrees/
    04_RandomForest/
    05_NeuralNetworks/
    05b_NN-from-scratch/
    06_SVM/
    Recordings: 2022-04-21

  5. Classification of Leukemia sub-type by gene expression data
    Code: 07_leukemia_classes/

1.2.3.1 Programming assignment 04

pa04_supervised
Topic: (Human breast cancer diagnosis via gene expression data categorization)

1.2.4 Network/Graph theory in biological and neurological sciences

git clone https://gitlab.com/bio-data/graph-network.git

  1. Formalizing biological networks
    Slides/Reading: Content/22-BioNetworks.html
    Video: http://barabasi.com/video/connected-the-power-of-six-degrees-en

  2. Network and Graph theory primer
    Slides:
    Review graph lectures here briefly first:
    ../DataStructures/Content.html
    Content/23-GraphTheory.html
    Reading:
    http://barabasi.com/f/147.pdf
    http://networksciencebook.com/ chapter 1, 2

  3. Python tools for networks and graphs
    Slides/Code: networkx_intro/

  4. Network datasets
    Meta-databases and aggregations:
    http://networkrepository.com (general networks, many bio)
    https://thebiogrid.org/ (central aggregation for bio networks, also below)
    https://neuinfo.org/ (brain super-meta-database under which all connectivity sets are housed)
    https://en.wikipedia.org/wiki/List_of_neuroscience_databases
    https://en.wikipedia.org/wiki/List_of_biological_databases
    https://en.wikipedia.org/wiki/List_of_biodiversity_database

  5. Demonstration of graph processing for human brain connectivity
    Slides and code: brain-conn_human/ (human brain!)

  6. Human diseaseome
    Slides, code, and data: diseaseome/ (human diseaseome!)

1.2.4.1 Programming assignment 07

Option 1: https://en.wikipedia.org/wiki/Caenorhabditis_elegans connectome
The first multicellular genome, the first connectome, the first whole animal simulation, etc.
https://en.wikipedia.org/wiki/History_of_research_on_Caenorhabditis_elegans

Connectome:
Meta-data for the worm set2 (includes organ data):
http://wormwiring.org/
http://wormwiring.org/sex/male.php
http://wormwiring.org/sex/hermaphrodite.php
Jupyter notebook with more information released on git-classes

Genome:
https://www.science.org/doi/10.1126/science.282.5396.2012

Whole animal simulation:
https://openworm.org/
https://github.com/openworm

Brain gene expression map:
https://www.cengen.org/

Option 2: Human https://en.wikipedia.org/wiki/Interactome - BioGrid
Documentation:
https://thebiogrid.org/
https://en.wikipedia.org/wiki/BioGRID
Jupyter notebook with more information released on git-classes

1.2.5 Vision in bioinformatics / Bioimage informatics

git clone https://gitlab.com/bio-data/computer-vision.git

  1. Basic image encoding and processing:
    Slides: Content/20-ImageBasics.html
    Reading to do is listed in links under section: “Image processing in python”
    Code: See repo above.

  2. Bioimage informatics
    Slides: Content/21-BioImage.html
    Code: See repo above.
    Reading:
    Below is an extra collection of PDF readings on bioimage processing:
    Content/20-21_biovision_reading.tar.gz

1.2.5.1 Programming assignment 06

Topic: (Cell image nucleus identification and labeling)
https://datasciencebowl.com/competitions/spot-nuclei-speed-cures/
https://www.kaggle.com/c/data-science-bowl-2018
https://www.nature.com/articles/s41592-019-0612-7
https://www.youtube.com/watch?v=Dbiq6l50zO8
https://www.youtube.com/watch?v=eHwkfhmJexs

1.2.6 Computational epidemiology

Maybe, if we get this far:
Content/24-CompEpi.html

1.3 (maybe below, if there’s ever time)

1.3.1 Human whole genome analysis

Content/25-WGS.html
Practical approach to human genome analysis

1.3.1.1 Programming assignment 06

Topic: script to analyze a sample human genome