This is the fourth in a series of 10 articles introducing non-experts to finding medical articles and assessing their value
In assessing the choice of statistical tests in a paper, first consider whether groups were analysed for their comparability at baseline
Does the test chosen reflect the type of data analysed (parametric or non-parametric, paired or unpaired)?
Has a two tailed test been performed whenever the effect of an intervention could conceivably be a negative one?
Have the data been analysed according to the original study protocol?
If obscure tests have been used, do the authors justify their choice and provide a reference?
As medicine leans increasingly on mathematics no clinician can afford to leave the statistical aspects of a paper to the "experts." If you are numerate, try the "Basic Statistics for Clinicians" series in the Canadian Medical Association Journal, (1-4) or a more mainstream statistical textbook. (5) If, on the other hand, you find statistics impossibly difficult, this article and the next in this series give a checklist of preliminary questions to help you appraise the statistical validity of a paper.
All statistical tests are either parametric (that is, they assume that the data were sampled from a particular form of distribution, such as a normal distribution) or non-parametric (they make no such assumption). In general, parametric tests are more powerful than non-parametric ones and so should be used if possible.
Non-parametric tests look at the rank order of the values (which one is the smallest, which one comes next, and so on) and ignore the absolute differences between them. As you might imagine, statistical significance is more difficult to show with non-parametric tests, and this tempts researchers to use statistics such as the r value inappropriately. Not only is the r value (parametric) easier to calculate than its non-parametric equivalent but it is also much more likely to give (apparently) significant results. Unfortunately, it will give a spurious estimate of the significance of the result, unless the data are appropriate to the test being used. More examples of parametric tests and their non-parametric equivalents are given in table 1.

Table 1
Another consideration is the shape of the distribution from which the data were sampled. When I was at school, my class plotted the amount of pocket money received against the number of children receiving that amount. The results formed a histogram the same shape as figure1-a "normal" distribution. (The term "normal" refers to the shape of the graph and is used because many biological phenomena show this pattern of distribution). Some biological variables such as body weight show "skew normal" distribution, as shown in figure 2. (Figure 2 shows a negative skew, whereas body weight would be positively skewed. The average adult male body weight is 70 kg, and people exist who weigh 140 kg, but nobody weighs less than nothing, so the graph cannot possibly be symmetrical.)

Figures 1 and 2
Non-normal (skewed) data can sometimes be transformed to give a graph of normal shape by performing some mathematical transformation (such as using the variable's logarithm, square root, or reciprocal). Some data, however, cannot be transformed into a smooth pattern. For a very readable discussion of the normal distribution see chapter 7 of Martin Bland's Introduction to Medical Statistics. (5)
Deciding whether data are normally distributed is not an academic exercise, since it will determine what type of statistical tests to use. For example, linear regression will give mislcading results unless the points on the scatter graph form a particular distribution about the regression line-that is, the residuals (the perpendicular distance from each point to the line) should themselves be normally distributed. Transforming data to achieve a normal distribution (if this is indeed achievable) is not cheating: it simply ensures that data values are given appropriate emphasis in assessing the overall effect. Using tests based on the normal distribution to analyse non-normally distributed data, however, is definitely cheating.
Raking over your data for "interesting results" (retrospective subgroup analysis) can lead to false conclusions. (8) In an early study on the use of aspirin in preventing stroke, the results showed a significant effect in both sexes combined, and a retrospective subgroup analysis seemed to show that the effect was confined to men. (9) This conclusion led to aspirin being withheld from women for many years, until the results of other studies (10) showed that this subgroup effect was spurious.
This and other examples are included in Oxman and Guyatt's, "A consumer's guide to subgroup analysis," which reproduces a useful checklist for deciding whether apparent subgroup differences are real. (11)
In this example, it is using the same person on both occasions which makes the pairings, but there are other possibilities (for example, any two measurements of bed occupancy made of the same hospital ward). In these situations, it is likely that the two sets of values will be significantly correlated (for example, my blood pressure next week is likely to be closer to my own blood pressure last week than to the blood pressure of a randomly selected adult last week). In other words, we would expect two randomly selected paired values to be closer to each other than two randomly selected unpaired values. Unless we allow for this, by carrying out the appropriate paired sample tests, we can end up with a biased estimate of the significance of our results.
But on what grounds may we assume that a low sodium diet could only conceivably put blood pressure down, but could never do the reverse, put it up? Even if there are valid physiological reasons in this particular example, it is certainly not good science always to assume that you know the direction of the effect which your intervention will have. A new drug intended to relieve nausea might actually exacerbate it, or an educational leaflet intended to reduce anxiety might increase it. Hence, your statistical analysis should, in general, test the hypothesis that either high or low values in your dataset have arisen by chance. In the language of the statisticians, this means you need a two tailed test, unless you have very convincing evidence that the difference can only be in one direction.
Statistically correcting for outliers (for example, to modify their effect on the overall result) requires sophisticated analysis and is covered elsewhere. (6)
The articles in this series are excerpts from How to read a paper: the basics of evidence based medicine. The book includes chapters on searching the literature and implementing evidence based findings. It can be ordered from the BMJ Bookshop: tel 0171 383 6185/6245; fax 0171 383 6662. Price [pound sign]13.95 UK members, [pound sign]14.95 non-members.
I am grateful to Mr John Dobby for educating me on statistics and for repeatedly checking and amending this article. Responsibility for any errors is mine alone.
2. Guyatt G, Jaenschke R, Heddle, N, Cook D, Shannon II, Walter S. Basic statistics for clinicians. 2. Interpreting study results: confidence intervals. Can Med Assoc J 1995;152:169-73. [Back to Summary points]
3. Jacnschke R, Guyatt G, Shannon II, Walter S, Cook D, Heddle, N. Basic statistics for clinicians: 3. Assessing the effects of treatment: measures of association. Can Med Assoc J 1995;152:351-7. [Back to Summary points]
4. Guyatt G, Walter S, Shannon H, Cook D, Jacoschke R, Heddle, N. Basic statistics for clinicians. 4. Correlation and regression. Can Med Assoc J 1995:152:497-504. [Back to Summary points]
5. Bland M. An introduction to medical statistics. Oxford: Oxford University Press. 1987. [Back to Summary points, Have the authors set..:What sort of data ha..]
6. Alunan D. Practical statistics for medical research. London: Chapman and Hall, 1995. [Back to Have the authors set..:Have they determined.., Paired data, tails, ..:Were "outliers" anal..]
7. Hughes MD, Pocock SJ. Stopping rules and estimation problems in clinical trials. Statistics in Medicine 1987;7:1231-42. [Back to Have the authors set..:Are the data analyse..]
8. Stewart LA, Parmar MKB. Bias in the analysis and reporting of randomized controlled trials. Int J Health Technology Assessment 1996;12:264-75. [Back to Have the authors set..:Are the data analyse..]
9. Canadian Cooperative Stroke Group. A randomised trial of aspirin and sulfinpyrazone in threatened stroke. N Engl J Med 1978;299:53-9. [Back to Have the authors set..:Are the data analyse..]
10. Antiplatelet Trialists Collaboration. Secondary prevention of vascular disease by prolonged antiplatelet treatment. BMJ 1988;296:320-1. [Back to Have the authors set..:Are the data analyse..]
11. Oxman, AD, Guyatt GH. A consumer's guide to subgroup analysis. Ann Intern Med 1992;116:79-84. [Back to Have the authors set..:Are the data analyse..]
