Statistical Analysis Finds Errors in Journal Papers

A recent paper appearing in Medical Research Methodology (2004 4:13) describes a statistical analysis on a years worth of Nature and BMJ. The analysis was particularly interested in congruence of the last digits of statistical results. Their results have led them to believe that there were errors in approximately 11% of the articles analyzed. The conclusion is that the errors (emphasis added) creep in during erroneous rounding. They suggest freely available raw data.

3 thoughts on “Statistical Analysis Finds Errors in Journal Papers”

  1. The whole reason I posted this was to find out what others think. I don’t know what to think about this paper. My statistics education is rudimentary, so I can’t really comment. But my gut feeling is it’s load a bunk. Much of it seems to be based on the fact that they found fewer than expected 4’s and 9’s in the least significant digits. Seems specious to me. What do experts have to say?

  2. Hey, I was thinking of posting this but didn’t get to it – thanks mtigges! There’s a New Scientist article on this too.

    Note that the errors claimed are worse than you describe – they claim to have found at least one error in 38% of the Nature papers – the 11% was the number they found “incongruent” (not sure what that means though).

    I’m no statistical expert either, but it looks like they have a point: there are routine procedures for determining the number of digits of precision in a result, and then rounding to that precision. The least significant digit, as part of a random set of real numbers to a certain precision, really should be evenly distributed between all the digits, 0 to 9.

  3. There are a few math definitions of congruence. But I took it to be that of two things not being equal. And in this case it seems to be an incongruence of the distribution of the least significant digits. They analyzed using two different tests, Chi-square and Kolmogorov-Smirnoff if I remember correctly. The incongruence was that the distributions differed from the normal distribution. That’s why I chose the 11% number … it seemed that’s what they were looking for, and I didn’t understand the discrepancy. And lastly, it’s why I wrote approximately 11%

    Thanks for pointing out the NewScientist article. I hadn’t seen that.

Comments are closed.