1 01-eScience


1.1 Audio-recording

1.2 Uses of statistics

eScience/stats.jpeg

1.3 Meta-science

https://en.wikipedia.org/wiki/Metascience
https://en.wikipedia.org/wiki/Replication_crisis
https://en.wikipedia.org/wiki/Reproducibility

1.4 Most important theme the whole semester

Dr. Taylor’s Tao of data analysis:
Follow the data, and abstract as little as possible!

Occasionally, thoughtful abstraction and summary statistics will be needed and helpful,
but much more rarely, and usually only in end-stage analysis or automation,
not in initial exploration (initial bushwhacking science).

1.5 Admin notes

This class has a lot of background,
so we won’t get to actual bioinformatics methods right away,
but they will come!
Today, we will illustrate the need for a computational approach,
using neuroscience as an example.

1.6 Improving the scientific process

1.6.1 How have functions of brain regions been studied?

Early studies:

Lesions (accidental and otherwise).
Neurosurgery - lesion and direct stimulation.
PET studies.
fMEG, fMRI.

Do small brain regions perform modular functions?

What is “representation”?

Function/Representation in cortex (task fMRI)

Classic fMRI
eScience/fMRI1.png

Define: Activation

1.7 Problem

Estimating the reproducibility of psychological science = 35%
eScience/repro.png
(OpenScience-Collaboration, 2015; Science)
OpenScience-Collaboration, “Estimating the reproducibility of psychological science.,” Science, vol. 349, p. aac4716, Aug. 2015.

Single study fMRI: What stinks?
Does anyone know what kind of animal this is with “significant” activations in its brain?
eScience/salmon.png

1.7.1 Single study fMRI: What stinks?

eScience/salmon2.png

C. M. Bennett, M. B. Miller, and G. L. Wolford,
“Neural correlates of inter-species perspective taking in the post-mortem atlantic salmon:
An argument for multiple comparisons correction,”
NeuroImage, vol. 47, no. 1, p. S125, 2009.

eScience/false.png
“There is increasing concern that most current published research findings are false. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser pre-selection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias.”
J. P. A. Ioannidis, “Why most published research findings are false,” PLoSMed, vol. 2, no. 8, p. e124, 2005.

e.g. discuss irradiating infant thymuses, statins, social psychology, nutrition research, Alzheimer’s drugs, etc.

1.7.2 Academic pressure

eScience/edwards-2017.pdf
Reggretably, at this point in time, the proporion of legitimate high quality science drops substantially each year.

1.7.3 Academic fashion

Academics (people) are not so rational about distributing research efforts or money to problems.

1.7.3.1 What is the most interesting area of the brain?

Red = popularly studied (Behrens, 2012)
eScience/bias1.png

1.7.3.2 What is the most interesting area of the brain?

Impact factor correlations (Behrens, 2012)
eScience/bias2.png
Picking the wrong brain region is a bad career move…

1.8 Publication bias, careerism, tenure incentives, politics in science

https://en.wikipedia.org/wiki/Publication_bias

https://www.science.org/content/blog-post/just-bribe-everyone-it-s-only-scientific-record
https://www.science.org/content/article/paper-mills-bribing-editors-scholarly-journals-science-investigation-finds

https://cacm.acm.org/magazines/2019/9/238959-an-inability-to-reproduce/fulltext
http://blogs.nature.com/news/2012/12/is-the-scientific-literature-self-correcting.html

Can’t find a taxpayer-funded publication behind a for-profit paywall, just read this article from the journal Science:
http://www.sciencemag.org/news/2016/04/whos-downloading-pirated-papers-everyone
http://www.sciencemag.org/news/2016/04/alexandra-elbakyan-founded-sci-hub-thwart-journal-paywalls?IntCmp=scihub-1-11
Hint: Can you find the .onion?

D. Butler, “Biologists join physics preprint club,” Nature, vol. 425, pp. 548–548, Oct. 2003.
Delamothe, R. Smith, M. A. Keller, J. Sack, and B. Witscher, “Netprints: the next phase in the evolution of biomedical publishing,” BMJ, vol. 319, pp. 1515–1516, Dec. 1999.
Van Noorden, “Mathematicians aim to take publishers out of publishing,” Nature, Jan. 2013.
http://genomesunzipped.org/2011/07/why-publish-science-in-peer-reviewed-journals.php

1.9 Solution 1: Meta-analysis

eScience/meta.jpg

Manual meta-analysis of function performs ok…
eScience/function.png

What enabled the industrial revolution?
Craftsperson to Assembly line.
eScience/shoemaker.png => eScience/indust.jpg
Large scale cooperation requires standardization, and precision.

A cool, related video series:
https://www.youtube.com/watch?v=gNRnrn5DE58&list=PLyvS3q4dHDYZZvBYOdy9vyNbkMyEH-eRg

Standard measurements needed for cooperation
1800: screw size standardized
1900: spread of manufacturing standardization and technology
What was enabled by standardizing screws and other technologies?

(GDP)
eScience/GDP.png

Can standardization speed scientific progress as well?

Craftsperson scientist: Single study versus ?
eScience/shoemaker.png => eScience/indust.jpg

eScience/fMRI1.png => ??

1.10 Solution 2: Systematic scale

1.10.1 Human Connectome Project

Solution 2: Very large projects
eScience/HCP.png
WU-Minn-Oxford group (the good one of the pair of schools doing this project)

First group alone was 1200 healthy adults, 300 twin pairs and their siblings.
Extensive demographic and behavioral data, heritability, blood for genotyping, GWAS (genome wide association study).
7T MR scanners, Resting-state fMRI, Task-evoked fMRI, Diffusion MRI with tractography analysis, MEG/EEG imaging on a subset of 100 subjects, including both resting-state MEG/EEG and task-evoked MEG/EEG, same tasks and timing as will be used in T-fMRI.

1.10.1.1 Solution 2: Systematic data collection

eScience/hcp.png

Structural data were used for connectivity (above).
Functional data used for meta-optimization (upcoming).

1.10.2 Blue Brain Project

Blue Brain Project Digital reconstruction of the brain by reverse-engineering mammalian brain circuitry
eScience/bluebrain0.png

1.10.3 Blue Brain Project

Blue Brain Project
eScience/bluebrain1.png

1.10.4 Allen Brain Atlas

eScience/aba0.png eScience/aba1.png
eScience/aba2.png eScience/aba3.png
Maps the expression of EVERY gene in the entire brain

1.11 Solution 3: Databasing existing data

eScience/method.png
Requires data sharing, centralized repositories

1.12 Solution 4: Ontologies and computability

Formal ontology for neuroscience studies
eScience/taxonomy.jpg

eScience/taxonomy_sub.jpg
Computability!

1.12.1 Computability and modeling the brain

Neuroscience has a vast literature with data in multiple levels/subfields.
Difficult to integrate these as a single researcher.
Computational models and multi-level complexity.

eScience/hbp2.jpg

1.12.2 BrainMap

BrainMap: 20 years of formally coded fMRI studies in one database

Database of manually entered fMRI publications including activation coordinates associated with experiments.
At the time of writing this, 19,921 experiments, 95,195 subjects.
Entered manually and high quality.
Research Imaging Institute of the University of Texas Health Science Center San Antonio.
http://www.brainmap.org
Practical to deal with database (fMRI anywhere).

Reminder: activations

Representation in human cortex:
BrainMap hierarchical clustering of behavioral labels by activation locations alone:

eScience/laird1.jpg

Functional networks in human cortex:
Functional activations (1000s of studies) versus functional connectivity (1 study):

eScience/crossly.png
eScience/cross.png

Side note:
diseases show increased prevalence at cortical network hubs,
including Alzheimers dementia, Aspergers syndrome, schizophrenia, frontotemporal dementia, juvenile myoclonic epilepsy, progressive supranuclear palsy, left and right temporal lobe epilepsy, and post-traumatic stress disorder.

1.12.3 Neurosynth

Automated fMRI databasing
Neurosynth platform (backend in Python)
auto-extracts tabular fMRI activation coordinates and word frequencies from published studies:

eScience/yark1.png

Set of features (words) for each study, labeling a linked set of activation coordinates.
E.g., article may use the word “faces” at a greater frequency than others
14,371 studies + their activations
University of Texas at Austin hosts at http://neurosynth.org/

Representation in human cortex:

Forward inference maps show the probability of activation,
given the presence of the term, P(act.|term)

Reverse inference maps show the probability of the term,
given observed activation, P(term|act.)

eScience/yark2.png

Data analysis task:
Which word-activation associations in the neurosynth database best spatially match your current brain state?
a.k.a. Mind-reading

Neurosynth: Mind reading task
eScience/yark3.png

eScience/yark5.png

Classifiers:
General method we will cover in class – many very cool types!
This one is just a naive Bayes model.

Above chance (diagonals) for every category (i.e., success!)
eScience/yark6.png

Validated by manual meta-analysis
eScience/yark4.png

  1. Manual meta-analysis (b,c) Neurosynth

How has the function of regions been studied?

Early studies

Lesions (accidental and otherwise).
Neurosurgery - lesion and direct stimulation.
PET studies.
fMEG, fMRI.

Modern neuroinformatics and computational neuroscience

Large databases of studies:
fMRI, MEG, DTI, rfMRI, gene expression, neuronal stucture, cellular connectivity
Increases in power (n)
More robust to bias (not entirely)
Computable ontologies
Functional models as hypotheses and publications/literature

1.13 Grand goal: Scientific standardization

Improve the pace and reliability of science?
Traditional verbal hypothesis testing works, but it is slow,
and is somewhat limited to describing simpler systems.

Goal is to make this model-building process much more systematic:

eScience/Slide3.jpg

Not 1 hypothesis, but at least 2, or better yet,
systematically refining a computational model
(e.g., bi-weekly model refinement based on empirical data).

The model is the knowledgebase,
and should be the unit of publication,
at least in many domains.