Posts tagged "statistical significance"
Academic equivalent of playing Street Fighter II in two-player mode. The Bayesians keep winning because the frequentist player base is larger and includes people who play no other button than “Start”.

(The data scientists play Minecraft instead.)

Academic equivalent of playing Street Fighter II in two-player mode. The Bayesians keep winning because the frequentist player base is larger and includes people who play no other button than “Start”.

(The data scientists play Minecraft instead.)

This entire episode highlights the dangers of focusing on the extreme values in any analysis. Generate 50 random numbers from any distribution, do a statistical test with a 90% confidence and likely you will be able to conclude that five those numbers are “statistically different” from the mean of the underlying distribution. The ones that are “statistically different” are the ones that will be highlighted in the media as doing far better or worst than most of the other states.
Understanding reported changes in median income » Good Stats     Bad Stats

In one of his stories, science fiction writer and curmudgeon Thomas Disch wrote, “Creativeness is the ability to see relationships where none exist.” We want our scientists to be creative, but we have to watch out for a system that allows any hunch to be ratcheted up to a level of statistical significance that is then taken as scientific proof.

Even if something is published in the flagship journal of the leading association of research psychologists, there’s no reason to believe it. The system of scientific publication is set up to encourage publication of spurious findings.

Statistics and psychology: Multiple comparisons give spurious results. - Slate Magazine (Andrew Gelman, df = 1)
A report to an inquiry in Victoria has estimated that at least one in every 20 Catholic priests in the state is a child sex abuser, with the real figure being likely to be more like one in 15.
The Null Device
Today’s announcement at CERN of the latest research on the Higgs boson was truly extraordinary. Not only was the scientific achievement remarkable, but medias reporting of 5-sigma as a measure of “certainty” was also truly remarkable. For instance, the science editor at the Swedish news paper Dagens Nyheter reported that a sigma of 4.9 equals a certainty of 99.99994 %, which obviously isn’t true, simply because p( D | H0 ) is not the same as p( H0 | D ). In plain english this means that a p-value represents the conditional probability of getting the data given that the null hypothesis is true. Nothing more, and it surely doesn’t give the probability for the alternative hypothesis being true, i.e. the “certainty” that somethings been found that’s not a random fluctuation.
The Higgs boson: 5-sigma and the concept of p-values | R Psychologist

The most common complaint is that physicists and journalists explain the meaning of a p-value incorrectly. For example, if the p-value is 0.000001 then we will see statements like “there is a 99.9999% confidence that the signal is real.” We then feel compelled to correct the statement: if there is no effect, then the chance of something as or more extreme is 0.000001.

Fair enough. But does it really matter? The big picture is: the evidence for the effect is overwhelming. Does it really matter if the wording is a bit misleading? I think we reinforce our image as pedants if we complain about this.

The Higgs Boson and the p-value Police « Normal Deviate
“This is the basic logic of hypothesis testing—conclude that your claim is correct if the chance of alternative claims being correct is small” (via Why You Shouldn’t Conclude “No Effect” from Statistically Insignificant Slopes « Elections « Carlisle Rainey).

I think this is more an issue of data scarcity to me, and one that would been attenuated, for example, by focusing on confidence intervals.

It’s also an issue of low resolution data. The author’s example is one where the model is trying to be of rank one on the Hibbs Efficacy Scale, where you beat the crap out of the covariance matrix by predicting a presidential election victory out of *one* independent variable. Try it again with state-level data to get more granularity, and the red-blue mikado plots will become much less ambiguous after a few residual-versus-predictor plots.

I do like the mikado plots, though. The eye seems to produce an intuitive credible interval out of them.

“This is the basic logic of hypothesis testing—conclude that your claim is correct if the chance of alternative claims being correct is small” (via Why You Shouldn’t Conclude “No Effect” from Statistically Insignificant Slopes « Elections « Carlisle Rainey).

I think this is more an issue of data scarcity to me, and one that would been attenuated, for example, by focusing on confidence intervals.

It’s also an issue of low resolution data. The author’s example is one where the model is trying to be of rank one on the Hibbs Efficacy Scale, where you beat the crap out of the covariance matrix by predicting a presidential election victory out of *one* independent variable. Try it again with state-level data to get more granularity, and the red-blue mikado plots will become much less ambiguous after a few residual-versus-predictor plots.

I do like the mikado plots, though. The eye seems to produce an intuitive credible interval out of them.

A blog companion to a bunch of courses on quantitative methods.

twitter.com/politbistro

view archive



About

Software

Map