Previous: 21-FunctionalProg.html
document.querySelector('video').playbackRate = 1.2
* Why should you care about crossword?
* Why should you care about matrices!
* What did the first pre-computers process?
* What is a List?
* Just a special case of a matrix
* THE universal user interface (UI)
* What is an image?
* ../../Bioinformatics/Content/20-ImageBasics.html
(actually review the top of this page.html)
* 22-DataVis/data_00a_matplotlib.py
(pudb3)
* How do yo detect a face?
* How do you detect a simple shape?
* How do you detect a line?
* How do you detect an edge?
* 22-DataVis/data_00b_images.py
(spyder)
* How do I do computer vision, or machine learning face
recognition?
* How does one store/model a 3D environment, like a realistic game
map?
* How does one image a brain, a brain over time?
* How does one keep abstract time series data?
* How does one keep abstract experimental data?
* How does one simulate a game or real-world conflict over a space?
Matrices are deeply intertwined with computation!
Welcome to the MATRIX!
+++++++++++ Cahoot-22a.1
https://mst.instructure.com/courses/58101/quizzes/57373
Step the code!
* 22-DataVis/data_01_numpy.py
Show these links, but don’t go over them in detail:
* https://scipy-lectures.org/intro/numpy/index.html
* https://numpy.org/doc/stable/user/index.html
* https://numpy.org/doc/stable/user/absolute_beginners.html
* https://numpy.org/doc/stable/user/quickstart.html
* https://numpy.org/doc/stable/user/basics.html
If you are interesting computational math, modeling, physics, AI, or
machine learning, I highly suggest you read the above tutorials in
full.
++++++ Cahoot-22b.1
https://mst.instructure.com/courses/58101/quizzes/57426
"In data science, 85 percent of time spent is preparing data, 10 percent of time is spent complaining about the need to prepare data, and 5 percent of the time is actually analyzing or modeling data..."
**"Datasets are like people... interrogate them enough, and they will tell you whatever you want to hear... whether or not it is true."**
The state of data analysis in many domains of science is indeed actually this dark, sometimes in this way:
If you can’t see the pattern, with simple descriptive statistics and
graphs, the pattern is probably not real!
https://cacm.acm.org/magazines/2019/9/238959-an-inability-to-reproduce/fulltext
http://blogs.nature.com/news/2012/12/is-the-scientific-literature-self-correcting.html
Can’t find a taxpayer-funded publication behind a for-profit paywall, just read this article from the journal Science:
J. P. A. Ioannidis, “Why most published research findings are false,” PLoSMed, vol. 2, no. 8, p. e124, 2005.
OpenScience-Collaboration, “Estimating the reproducibility of psychological science.,” Science, vol. 349, p. aac4716, Aug. 2015.
D. Butler, “Biologists join physics preprint club,” Nature, vol. 425, pp. 548–548, Oct. 2003.
Delamothe, R. Smith, M. A. Keller, J. Sack, and B. Witscher, “Netprints: the next phase in the evolution of biomedical publishing,” BMJ, vol. 319, pp. 1515–1516, Dec. 1999.
Van Noorden, “Mathematicians aim to take publishers out of publishing,” Nature, Jan. 2013.
C. M. Bennett, M. B. Miller, and G. L. Wolford, “Neural correlates of inter-species perspective taking in the post-mortem atlantic salmon: An argument for multiple comparisons correction,” NeuroImage, vol. 47, no. 1, p. S125, 2009.
http://genomesunzipped.org/2011/07/why-publish-science-in-peer-reviewed-journals.php
Publication bias
Actual bias
Some domains of science are vulnerable to such problems, while others are not.
In psychology or biology, data mining is often an accusation, while in computer science, it may be something we say with pride. The difference is, in part, one of transparency.
What to do about it??
**Dr. Taylor's Tao of data analysis: Follow the data, and abstract as little as possible!**
Occasionally, thoughtful abstraction and summary statistics will be needed and helpful, but much more rarely, and usually only in the end-stage analysis or automation, not in initial exploration (initial bushwhacking science).
For the one-off little summary, not really for large-scale data
analysis:
* https://docs.python.org/3/library/statistics.html
If we are doing science, how do we organize our data correctly the
first time, so as not to have to spend all that time wrangling it?
* Wide, narrow, columns, rows?
If you are doing data analysis, what language do you use?
* Python
* R
* Matlab
* Julia
Provide some history and context on these and the dataframe.
To learn more:
https://learnxinyminutes.com/docs/pythonstatcomp/
Q: How did the panda interpret the data wrong?
A: He was “Bamboozled”!
pandas has created pandamonium in the data science
world!
A great way to pander to the needs of your data… as you ponder the dataset’s deeper meaning.
The pandas dataframe allows you to arbitrarily retrieve complex subsets of your data!!!
Note: pandas was/is a rapidly evolving package, and they have ruthlessly broken backwards compatibility for new optimizations over the years, so these (or any) cheatsheets may not be current.
In the past, you may have pulled data you wanted to analyze into excel, whereas pandas can do all that and more!
* https://www.tomasbeuzen.com/python-programming-for-data-science/README.html (good interactive ipynb book).
* https://jakevdp.github.io/PythonDataScienceHandbook/ (good book in Jupyter notebooks)
* https://pythonprogramming.net/data-analysis-tutorials/
* http://data-analysis-in-python.org/
* https://pandas.pydata.org/pandas-docs/stable/tutorials.html
* http://shop.oreilly.com/product/0636920023784.do
* Pandas cheat sheet: https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
+++++++++++++ Cahoot-22c.1
https://mst.instructure.com/courses/58101/quizzes/57516
Data scientists love beautiful data pictures!
* http://www.scipy-lectures.org/intro/matplotlib/index.html
* https://matplotlib.org/tutorials/
* https://matplotlib.org/tutorials/introductory/usage.html (go over in
lecture)
* https://matplotlib.org/tutorials/introductory/pyplot.html
See: ../../Bioinformatics/Content/02-PlatformTools.html
These are some resources to actually learn data analysis and science
in a focused, sequential way:
* https://jakevdp.github.io/PythonDataScienceHandbook/ (looks like a
quite good book, built from Jupyter notebooks)
* http://data-analysis-in-python.org/
+++++++++++ Cahoot-22d.1
https://mst.instructure.com/courses/58101/quizzes/57573
What is jupyter notebook?
An IDE
A lab notebook
A format for tutorials
A python interpreter
Sci-fy love story
Next: 23-Regex.html