- The article's author also recommends these online materials: https://www.stat.berkeley.edu/~stark/SticiGui/Text/toc.htm.
Wholeheartedly agree. It took me a year to slowly go through all chapters and this experience really influenced my way of thinking.
I wish other topics had books of that level of quality.
> The book by Freedman, Pisani, and Purves is the one I would have liked to teach from, and it was the book I drew upon the most in prepping my own lectures, as an antidote to the overwrought and confused style of my assigned text. The authors maintain the underlying attitude that statistics is a useful tool for understanding certain questions about the world, but in this way it augments human judgement, rather than supplanting it. To quote from the preface:
> > Why does the book include so many exercises that cannot be solved by plugging into a formula? The reason is that few real-life statistical problems can be solved that way. Blindly plugging into statistical formulas has caused a lot of confusion. So this book takes a different approach: thinking.
Math fans tend to discount and dismiss applied statistics as being not math, in a way that they don't do for physics, for some reason I don't fully grasp.
I think it's because statistics gets a bad reputation from the legions of terrible social scientists in the wild, who can easily publish false but socially interesting results that get applied to our real lives. Mathematically fraudulent physics, on the other hand, usually immediately dies in the engineering phase, leaving just a few rambling cranks that most of everyone ignores.
Also (and related) perhaps, just as dry mathematical statistics ignores real world empirical experimentation, "wet" applied statistics goes to far into ignoring the math completely, because too few empirical scientists are able to understand the math when they would wncounter itm
It's because we're secretly afraid that the physicists are smarter than us.
Less facetiously, physicists keep discovering things that lead to new mathematics we would never have dreamed of ourselves, so we have a healthy respect for how insightful they can be.
>The book is not without its weak moments, although they are few. One in particular which I recall is the treatment of A/B testing. Essential to any hypothesis testing is the matter of how to reduce the sampling mechanism to a simple probabilistic model, so that a quantitative test may be derived. The book emphasizes one such model: simple random sampling from a population, which then involves the standard probabilistic ideas of binomial and multinomial distributions, along with the normal approximation to these. Thus, one obtains the z-test.
>In the context of randomized controlled experiments, where a set of subjects is randomly assigned to either a control or treatment group, the simple random sampling model is inapplicable. Nonetheless, when asking whether the treatment has an effect there is a suitable (two-sample) z-test. The mathematical ideas behind it are necessarily different from those of the previously mentioned z-test, because the sampling mechanism here is different, but the end result looks the same. Why this works out as it does is explained rather opaquely in the book, since the authors never developed the probabilistic tools necessary to make sense of it (here one would find at least a mention of hypergeometric distributions). Given the emphasis placed in the beginning of the book on the importance of randomized, controlled experiments in statistics, it feels like this topic is getting short-shrift.
Can anyone recommend good resources to fill this alleged gap?
The standard error of the difference assumes that a) samples are drawn independently, i.e., with replacement; and b) that the two groups are independent of each other. By samples being drawn, I mean a subject being assigned to a group in a RCT here.
If you derive the standard error of the difference, there are two covariance terms that are zero when these assumptions are true. When they're violated, like in RCTs, the covariances are non-zero and should in theory be accounted for. However, Freedman implies that it doesn't actually matter because they effectively cancel each other out, as one inflates the standard error and the other deflates it.
E. L. Lehmann, 'Nonparametrics: Statistical Methods Based on Ranks', ISBN 0-8162-4994-6, Holden-Day, San Francisco, 1975.
Sidney Siegel, 'Nonparametric Statistics for the Behavioral Sciences', McGraw-Hill, New York, 1956.
Bradley Efron, 'The Jackknife, the Bootstrap, and Other Resampling Plans', ISBN 0-89871-179-7, SIAM, Philadelphia, 1982.
Hypothesis testing?? Somewhere maybe I still have my little paper I wrote on using the Hahn decomposition and the Radon-Nikodym theorem to give a relatively general proof of the Neyman-Pearson theorem about the most powerful hypothesis test.
'Causal Inference: What If' is a nice intro and freely available: https://www.hsph.harvard.edu/miguel-hernan/causal-inference-...
I guess it wouldn't be a problem if the techniques being taught in STATS101 were actually usable in the real world. A bit like driving a car: you don't need to know how internal combustion engines work, you just need to press the pedals (and not endanger others on the road). The problem is z-tests, t-tests, ANOVA, have very limited use cases. Most real-world data analysis will require more advanced models, so the STATS education is doubly-problematic: does not teach you useful skills OR teach you general principles.
I spent a lot of time researching and thinking about STATS curriculum and choosing which topics are actually worth covering. I wrote a blog post about this[1]. In the end I settled on a computation-heavy approach, which allows me to do lots of hands simulations and demonstrations of concepts, something that will be helpful for tech-literate readers, but I think also for the non-tech people, since it will be easier to learn Python+STATS than to try to learn STATS alone. Here is a detailed argument about how Python is useful for learning statistics[2].
If you're interested in seeing the book outline, you can check this google doc[3]. Comments welcome. I'm currently writing the last chapter, so hopefully will be done with it by January. I have a mailing list[4] for ppl who want to be notified when the book is ready.
[1] https://minireference.com/blog/fixing-the-statistics-curricu...
[2] https://minireference.com/blog/python-for-stats/
[3] https://docs.google.com/document/d/1fwep23-95U-w1QMPU31nOvUn...
This paper probably seems obvious to a lot of people but i found when i gave talks about things and read and reviewed papers people typically didn't know basic things like why you might leave some data out as a test set, why some models work better than others, when you use logistic regression versus linear regression, etc.
The general advice about measuring/comparing models also seems useful.
Table of contents and section 1:
https://homepages.dcc.ufmg.br/~assuncao/EstatCC/Slides/Extra...
"Why does the book include so many exercises that cannot be solved by plugging into a formula? The reason is that few real-life statistical problems can be solved that way. Blindly plugging into statistical formulas has caused a lot of confusion. So this book takes a different approach: thinking."
This applies to both math and stats. I appreciate the value in grinding pure, fundamental technique but as I'm reviewing I'm missing more real-life applications. Theory feels like a plan until real-life throws you the first punch.
I'll be buying this book, thanks for the recommendation!
I hate to be contrarian, but even though I have a degree in statistics, I feel like much of statistics/probability actually violates common sense. In fact, it's probably the most unintuitive field that I'm familiar with.
Many of the readers will probably be familiar with the Monty Hall problem or the Birthday problem, but imo, the entire field of statistics/probability is about equally unintuitive/violating of common sense.
Let’s say that you get a positive diagnosis for the disease, and you ask someone the question:
What is the probability you actually have the disease?
Most people will say 95% or 99%, but your actual probability of having the disease in this example is <2%
I like this style of introducing a technical topic to a broad audience. It builds incrementally and practically. The prose is clear enough for a layman to gain a conceptual appreciation of the methods even if they skip the exercises. And while the exercises weren’t too demanding, there were many of them, always framed in real world context. For the portion of the audience who will study further, I like to think that the book’s approach towards problem solving and challenging the intuition could be helpful throughout an entire career of statistical thinking.
I have a mental image of a Tufte-like book that aims to profoundly sharpen the students' BS-detector. That is, teach the student by deliberately showing broken things, and then guide the reader: can they spot how things are broken? What might they try to fix first? How might these fixes themselves have flaws? How might people try to hide issues? And so on.
Its my assertion that non technical people have, or can be trained to have, excellent BS detection skills even if they dont speak the mathematical languages.
The worst outcome, one we have today, is that those students are dazzled and confused by the mathematical discourse, but believe they have to obey, so they end up believing in a formulaic Statistics God that is fed p values and other detritus and spits out Insight in return: when in fact, it does nothing of the sort.