1 01-eScience

1.1 Audio-recording

Password for the Vimeo videos is in Zulip chat.
https://vimeo.com/669286000
Tip: If anyone want to speed up the lecture videos a little, inspect the page, go to the browser console, and paste this in: document.querySelector('video').playbackRate = 1.2

1.2 Uses of statistics

1.3 Meta-science

https://en.wikipedia.org/wiki/Metascience
https://en.wikipedia.org/wiki/Replication_crisis
https://en.wikipedia.org/wiki/Reproducibility

1.4 Most important theme the whole semester

Dr. Taylor’s Tao of data analysis:
Follow the data, and abstract as little as possible!

Occasionally, thoughtful abstraction and summary statistics will be needed and helpful,
but much more rarely, and usually only in end-stage analysis or automation,
not in initial exploration (initial bushwhacking science).

1.5 Admin notes

This class has a lot of background,
so we won’t get to actual bioinformatics methods right away,
but they will come!
Today, we will illustrate the need for a computational approach,
using neuroscience as an example.

1.6 Improving the scientific process

1.6.1 How have functions of brain regions been studied?

Early studies:

Lesions (accidental and otherwise).
Neurosurgery - lesion and direct stimulation.
PET studies.
fMEG, fMRI.

Do small brain regions perform modular functions?

What is “representation”?

Function/Representation in cortex (task fMRI)

Classic fMRI
01-eScience/fMRI1.png

Define: Activation

1.7 Problem

Estimating the reproducibility of psychological science = 35%
01-eScience/repro.png
(OpenScience-Collaboration, 2015; Science)
OpenScience-Collaboration, “Estimating the reproducibility of psychological science.,” Science, vol. 349, p. aac4716, Aug. 2015.

Single study fMRI: What stinks?
Does anyone know what kind of animal this is with “significant” activations in its brain?
01-eScience/salmon.png

1.7.1 Single study fMRI: What stinks?

C. M. Bennett, M. B. Miller, and G. L. Wolford,
“Neural correlates of inter-species perspective taking in the post-mortem atlantic salmon:
An argument for multiple comparisons correction,”
NeuroImage, vol. 47, no. 1, p. S125, 2009.

01-eScience/false.png
“There is increasing concern that most current published research findings are false. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser pre-selection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias.”
J. P. A. Ioannidis, “Why most published research findings are false,” PLoSMed, vol. 2, no. 8, p. e124, 2005.

e.g. discuss irradiating infant thymuses, statins, social psychology, nutrition research, Alzheimer’s drugs, etc.

1.7.2 Academic pressure

01-eScience/edwards-2017.pdf
Reggretably, at this point in time, the proporion of legitimate high quality science drops substantially each year.

1.7.3 Academic fashion

Academics (people) are not so rational about distributing research efforts or money to problems.

1.7.3.1 What is the most interesting area of the brain?

Red = popularly studied (Behrens, 2012)
01-eScience/bias1.png

1.7.3.2 What is the most interesting area of the brain?

Impact factor correlations (Behrens, 2012)
01-eScience/bias2.png
Picking the wrong brain region is a bad career move…

1.8 Publication bias, careerism, tenure incentives, politics in science

https://en.wikipedia.org/wiki/Publication_bias

https://www.science.org/content/blog-post/just-bribe-everyone-it-s-only-scientific-record
https://www.science.org/content/article/paper-mills-bribing-editors-scholarly-journals-science-investigation-finds

https://cacm.acm.org/magazines/2019/9/238959-an-inability-to-reproduce/fulltext
http://blogs.nature.com/news/2012/12/is-the-scientific-literature-self-correcting.html

Can’t find a taxpayer-funded publication behind a for-profit paywall, just read this article from the journal Science:
http://www.sciencemag.org/news/2016/04/whos-downloading-pirated-papers-everyone
http://www.sciencemag.org/news/2016/04/alexandra-elbakyan-founded-sci-hub-thwart-journal-paywalls?IntCmp=scihub-1-11
Hint: Can you find the .onion?

D. Butler, “Biologists join physics preprint club,” Nature, vol. 425, pp. 548–548, Oct. 2003.
Delamothe, R. Smith, M. A. Keller, J. Sack, and B. Witscher, “Netprints: the next phase in the evolution of biomedical publishing,” BMJ, vol. 319, pp. 1515–1516, Dec. 1999.
Van Noorden, “Mathematicians aim to take publishers out of publishing,” Nature, Jan. 2013.
http://genomesunzipped.org/2011/07/why-publish-science-in-peer-reviewed-journals.php

1.9 Solution 1: Meta-analysis

Manual meta-analysis of function performs ok…
01-eScience/function.png

What enabled the industrial revolution?
Craftsperson to Assembly line.
01-eScience/shoemaker.png => 01-eScience/indust.jpg
Large scale cooperation requires standardization, and precision.

Standard measurements needed for cooperation: 1800: screw size standardized; 1900: spread of manufacturing standardization and technology
What was enabled by standardizing screws and other technologies?

(GDP)
01-eScience/GDP.png

Can standardization speed scientific progress as well?

Craftsperson scientist: Single study versus ?
01-eScience/shoemaker.png => 01-eScience/indust.jpg

01-eScience/fMRI1.png => ??

1.10 Solution 2: Systematic scale

1.10.1 Human Connectome Project

Solution 2: Very large projects
01-eScience/HCP.png
WU-Minn-Oxford group (the good one of the pair of schools doing this project)

First group alone was 1200 healthy adults, 300 twin pairs and their siblings.
Extensive demographic and behavioral data, heritability, blood for genotyping, GWAS (genome wide association study).
7T MR scanners, Resting-state fMRI, Task-evoked fMRI, Diffusion MRI with tractography analysis, MEG/EEG imaging on a subset of 100 subjects, including both resting-state MEG/EEG and task-evoked MEG/EEG, same tasks and timing as will be used in T-fMRI.

1.10.1.1 Solution 2: Systematic data collection

Structural data were used for connectivity (above).
Functional data used for meta-optimization (upcoming).

1.10.2 Blue Brain Project

Blue Brain Project Digital reconstruction of the brain by reverse-engineering mammalian brain circuitry
01-eScience/bluebrain0.png

1.10.3 Blue Brain Project

Blue Brain Project
01-eScience/bluebrain1.png

1.10.4 Allen Brain Atlas

01-eScience/aba0.png 01-eScience/aba1.png
01-eScience/aba2.png 01-eScience/aba3.png
Maps the expression of EVERY gene in the entire brain

1.11 Solution 3: Databasing existing data

01-eScience/method.png
Requires data sharing, centralized repositories

1.12 Solution 4: Ontologies and computability

Formal ontology for neuroscience studies
01-eScience/taxonomy.jpg

01-eScience/taxonomy_sub.jpg
Computability!

1.12.1 Computability and modeling the brain

Neuroscience has a vast literature with data in multiple levels/subfields.
Difficult to integrate these as a single researcher.
Computational models and multi-level complexity.

1.12.2 BrainMap

BrainMap: 20 years of formally coded fMRI studies in one database

Database of manually entered fMRI publications including activation coordinates associated with experiments.
At the time of writing this, 19,921 experiments, 95,195 subjects.
Entered manually and high quality.
Research Imaging Institute of the University of Texas Health Science Center San Antonio.
http://www.brainmap.org
Practical to deal with database (fMRI anywhere).

Reminder: activations

Representation in human cortex:
BrainMap hierarchical clustering of behavioral labels by activation locations alone:

Functional networks in human cortex:
Functional activations (1000s of studies) versus functional connectivity (1 study):

01-eScience/crossly.png
01-eScience/cross.png

Side note:
diseases show increased prevalence at cortical network hubs,
including Alzheimers dementia, Aspergers syndrome, schizophrenia, frontotemporal dementia, juvenile myoclonic epilepsy, progressive supranuclear palsy, left and right temporal lobe epilepsy, and post-traumatic stress disorder.

1.12.3 Neurosynth

Automated fMRI databasing
Neurosynth platform (backend in Python)
auto-extracts tabular fMRI activation coordinates and word frequencies from published studies:

Set of features (words) for each study, labeling a linked set of activation coordinates.
E.g., article may use the word “faces” at a greater frequency than others
14,371 studies + their activations
University of Texas at Austin hosts at http://neurosynth.org/

Representation in human cortex:

Forward inference maps show the probability of activation,
given the presence of the term, P(act.|term)

Reverse inference maps show the probability of the term,
given observed activation, P(term|act.)

Data analysis task:
Which word-activation associations in the neurosynth database best spatially match your current brain state?
a.k.a. Mind-reading

Neurosynth: Mind reading task
01-eScience/yark3.png

Classifiers:
General method we will cover in class – many very cool types!
This one is just a naive Bayes model.

Above chance (diagonals) for every category (i.e., success!)
01-eScience/yark6.png

Validated by manual meta-analysis
01-eScience/yark4.png

Manual meta-analysis (b,c) Neurosynth

How has the function of regions been studied?

Early studies

Lesions (accidental and otherwise).
Neurosurgery - lesion and direct stimulation.
PET studies.
fMEG, fMRI.

Modern neuroinformatics and computational neuroscience

Large databases of studies:
fMRI, MEG, DTI, rfMRI, gene expression, neuronal stucture, cellular connectivity
Increases in power (n)
More robust to bias (not entirely)
Computable ontologies
Functional models as hypotheses and publications/literature

1.13 Grand goal: Scientific standardization

Improve the pace and reliability of science?
Traditional verbal hypothesis testing works, but it is slow,
and is somewhat limited to describing simpler systems.

Goal is to make this model-building process much more systematic:

Not 1 hypothesis, but at least 2, or better yet,
systematically refining a computational model
(e.g., bi-weekly model refinement based on empirical data).

The model is the knowledgebase,
and should be the unit of publication,
at least in many domains.