Sungkyu Jung
Topics in Applied Statistics
“Basics of Data Science”
Meant to be “STAT 1261: Principles of Data Science”
Bookmark your course webpage: http://www.stat.pitt.edu/sungkyu/course/pds/
Engineering and Computer Science played key role
(https://dataorigami.net/blogs/napkin-folding/17543555-datas-use-in-the-21st-century)
Does fertilizer increase crop yields? Answer: Collect and analyze agricultural experimental data
Does Streptomycin cure Tuberculosis? Collect and analyze randomized trials data
Does smoking cause lung-cancer? Collect and analyze observational studies data
That’s what statisticians are already doing.
“I keep saying that the sexy job in the next 10 years will be statisticians,” said Hal Varian, chief economist at Google. “And I’m not kidding.”
“The ability to take data - to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it’s going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids. Because now we really do have essentially free and ubiquitous data.”
Schematic of the modern statistical analysis process
We focus on the bubbles to the left and right
I chose a research project (in sociology)
There are industrial/business projects; see e.g. https://www.datascienceweekly.org/articles/aspiring-data-scientist-here-are-some-at-work-project-ideas or https://www.analyticsvidhya.com/blog/2014/11/data-science-projects-learn/
Here is a recent example of data science.
Rojas and his colleagues pose the question:
Is social media a valid indicator of political behavior?
The data being analyzed were scraped from the Internet
The research question was addressed by combining domain knowledge
A large amount of data
Need experts in sociology and in data science
Read the draft of the paper
Pair up
Critically review the paper
(https://ssrn.com/abstract=2235423)
(http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0079449)
How would you reproduce this study?