mycode.tar.gz
is a zip-like archive file.
$tar -xf mycode.tar.gz
notebook.ipynb
or notebook.md
is an
example name for a Jupyter notebook, which can be opened on your
computer with the python suite tool suite detailed here: Content/02-PlatformTools.htmlInspiration, introduction, syllabus, etc
Content/00-Inspiration.html
The second scientific revolution
(e-science)
Content/01-eScience.html
Tech setup, ipython, git-classes, code
notebooks
Content/02-PlatformTools.html
Python review
Content/03-PythonReview.html
Genetics review
Content/04-BioReview.html
Bioinformatics basics
Content/05-BioInfoBasics.html
Topic: Class virtual machine, jupyter, numpy, pandas,
matplotlib.
Log into https://git-classes.mst.edu to see this and all future
assignments.
To get the relevant files for this section,
start up the class VM,
navigate to a directory you want to store the lecture notes in,
and then:
git clone https://gitlab.com/bio-data/sequence-informatics.git
As I add lectures, or improve old ones,
they’ll be updated in the repo,
so you can just do this is the repo to get the latest notebook
scripts:
git pull
Introduction and bioinformatics software (read as part of
pa00-platform, not lecture)
00-introduction.py
,
01-biological-information.py
Pairwise sequence alignment
02-pairwise-alignment.py
Recordings:
Part 1: https://vimeo.com/674616839
Part 2: https://vimeo.com/677007265
Part 3: https://vimeo.com/677007892
Part 4: https://vimeo.com/680727824
pa01: https://vimeo.com/680728953
Database searching and sequence homology
03-database-searching.py
Recordings:
Part 1: https://vimeo.com/680730089
Part 2: 2022-02-23 (distracted in class and forgot to click start
recording…sorry)
Part 3: 2022-02-25 https://vimeo.com/682719141
Part 4: 2022-02-28 (distracted in class and forgot to click start
recording…sorry)
Multiple sequence alignment
04-multiple-sequence-alignment.py
Recordings:
Part 1: 2022-03-02 https://vimeo.com/685957644
Part 2: 2022-03-04 https://vimeo.com/685957698
Part 3: 2022-03-07 https://vimeo.com/685957751
Part 4: 2022-03-09 https://vimeo.com/687727294
pa02: 2022-03-11 https://vimeo.com/687728220
Phylogenetic reconstruction
05-phylogeny-reconstruction.py
Recordings:
Part 1: 2022-03-14 https://vimeo.com/692413491
Part 2: 2022-03-16 https://vimeo.com/692413573
pa02 working day: 2022-03-21 (no recording)
Sequence mapping and clustering, SOM
06-sequence-mapping-and-clustering.py
Recordings:
Part 1: 2022-03-23 https://vimeo.com/692413680
Part 2: 2022-03-25 https://vimeo.com/692413878
Part 3: 2022-04-04 https://vimeo.com/696706898
Part 4: 2022-04-06 https://vimeo.com/696707009
Diversity informatics
07-biological-diversity.py
Recordings:
Part 1: 2022-04-08 https://vimeo.com/700484079
Part 2: 2022-04-11 https://vimeo.com/700484242
Machine learning in bioinformatics
08-machine-learning.py
Skipped this one for pacing
A fun recent example of a large genomics project employing some neat
machine learning analyses to the data:
https://zoonomiaproject.org/
A pop news article about it:
https://www.vice.com/en/article/4a3wwg/scientists-sequenced-dna-of-nearly-every-mammal-on-earth-in-unprecedented-project
One paper using theses methods to understand the genetics of brain size
expansion throughout evolution:
https://www.science.org/doi/10.1126/science.abm7993
Pairwise alignment and automated PCR primer selection
Multiple sequence alignment (MSA) and OTU clustering
K-means vs. self-organizing map (SOM) for cancer clustering and dimensionality reduction
git clone https://gitlab.com/bio-data/machine-learning.git
sklearn, Breast Cancer dataset intro
Code: 00_BC_basics/
Recording: 2022-04-15: https://vimeo.com/700484472
Read more:
http://www.scipy-lectures.org/packages/scikit-learn/index.html
https://scikit-learn.org/stable/tutorial/index.html
http://scikit-learn.org/stable/tutorial/basic/tutorial.html
http://scikit-learn.org/stable/tutorial/statistical_inference/index.html
https://scikit-learn.org/stable/user_guide.html
Intro to supervised learning, k-NN on pre-extracted
Breast Cancer image features
Code: 01_KNN/
Recordings:
Part 1: 2022-04-18 https://vimeo.com/700484612
Part 2: 2022-04-20 (below)
Regression, Bayes models, and Decision trees on
pre-extracted Breast Cancer image features
Code: 02_LogisticRegression/
Recordings: 2022-04-20
Random forest, Neural networks, SVM on pre-extracted
Breast Cancer image features
Content/17-NeuralNetworks.html
[ ] Zim this: Content/spiking_pres_aggregate.pdf
Code:
03_DecisionTrees/
04_RandomForest/
05_NeuralNetworks/
05b_NN-from-scratch/
06_SVM/
Recordings: 2022-04-21
Classification of Leukemia sub-type by gene expression
data
Code: 07_leukemia_classes/
pa04_supervised
Topic: (Human breast cancer diagnosis via gene expression data
categorization)
git clone https://gitlab.com/bio-data/graph-network.git
Formalizing biological networks
Slides/Reading: Content/22-BioNetworks.html
Video: http://barabasi.com/video/connected-the-power-of-six-degrees-en
Network and Graph theory primer
Slides:
Review graph lectures here briefly first:
../DataStructures/Content.html
Content/23-GraphTheory.html
Reading:
http://barabasi.com/f/147.pdf
http://networksciencebook.com/ chapter 1, 2
Python tools for networks and graphs
Slides/Code: networkx_intro/
Network datasets
Meta-databases and aggregations:
http://networkrepository.com (general networks, many
bio)
https://thebiogrid.org/ (central aggregation for bio
networks, also below)
https://neuinfo.org/
(brain super-meta-database under which all connectivity sets are
housed)
https://en.wikipedia.org/wiki/List_of_neuroscience_databases
https://en.wikipedia.org/wiki/List_of_biological_databases
https://en.wikipedia.org/wiki/List_of_biodiversity_database
Demonstration of graph processing for human brain
connectivity
Slides and code: brain-conn_human/
(human brain!)
Human diseaseome
Slides, code, and data: diseaseome/
(human
diseaseome!)
Option 1: https://en.wikipedia.org/wiki/Caenorhabditis_elegans
connectome
The first multicellular genome, the first connectome, the first whole
animal simulation, etc.
https://en.wikipedia.org/wiki/History_of_research_on_Caenorhabditis_elegans
Connectome:
Meta-data for the worm set2 (includes organ data):
http://wormwiring.org/
http://wormwiring.org/sex/male.php
http://wormwiring.org/sex/hermaphrodite.php
Jupyter notebook with more information released on git-classes
Genome:
https://www.science.org/doi/10.1126/science.282.5396.2012
Whole animal simulation:
https://openworm.org/
https://github.com/openworm
Brain gene expression map:
https://www.cengen.org/
Option 2: Human https://en.wikipedia.org/wiki/Interactome -
BioGrid
Documentation:
https://thebiogrid.org/
https://en.wikipedia.org/wiki/BioGRID
Jupyter notebook with more information released on git-classes
git clone https://gitlab.com/bio-data/computer-vision.git
Basic image encoding and processing:
Slides: Content/20-ImageBasics.html
Reading to do is listed in links under section: “Image processing in
python”
Code: See repo above.
Bioimage informatics
Slides: Content/21-BioImage.html
Code: See repo above.
Reading:
Below is an extra collection of PDF readings on bioimage
processing:
Content/20-21_biovision_reading.tar.gz
Topic: (Cell image nucleus identification and labeling)
https://datasciencebowl.com/competitions/spot-nuclei-speed-cures/
https://www.kaggle.com/c/data-science-bowl-2018
https://www.nature.com/articles/s41592-019-0612-7
https://www.youtube.com/watch?v=Dbiq6l50zO8
https://www.youtube.com/watch?v=eHwkfhmJexs
Maybe, if we get this far:
Content/24-CompEpi.html
Content/25-WGS.html
Practical approach to human genome analysis
Topic: script to analyze a sample human genome