[an error occurred while processing this directive]
[an error occurred while processing this directive] [an error occurred while processing this directive] _

Statistics Primer


Compiled by Dale Godfrey and Michael Bailey

Some basic statistics guidelines

Introduction

Surprisingly, although most medical research scientists use statistics to support their investigations, many do not know which tests are appropriate under which conditions. This is potentially a big problem - use of the wrong test can lead to the wrong conclusion (ie. Two groups are judged to be statistically different (p ‹ 0.05), when the correct test would have said they are not). Although there is no need for us all to drop everything and do a statistics course, there are some basic guidelines that we should all follow if we are to use stats to support our data. A sure fire way of getting tripped up in a research presentation is to present data with stats when you have used an inappropriate test.

Different tests commonly used in medical research

There are 6 different tests that are commonly used in our type of research, depending on the experimental situation.

  1. Student's t test for comparison of 2 groups of data that are normally distributed.
  2. Mann-Whitney U test for comparison of 2 groups of data for which the distribution is not known.
  3. ANOVA (analysis of variance) for comparison of 3 or more groups of data that are normally distributed.
  4. Kruskal-Wallis for comparison of 3 or more groups of data for which the distribution is not known.
  5. Chi-square test for categorical data.
  6. Fisher's exact test for categorical data with a lower number of samples.

Comparing two groups

Note that the way the data is distributed (ie. Normal (bell shaped curve), or not known) is the most important factor in determining the appropriate test. This is because some tests make assumptions about the population distribution based on the sample taken (the population represents the body of data from which you take a sample). If the population from which the sample was taken is not normally distributed, the assumptions made by that test regarding the population are inaccurate, and may even be way off. Many populations are likely to be normally distributed, (eg. Pulse rates of Honours Students), but some populations are not (eg. Number of days after deadline for Honours thesis submission). Most importantly, if you have a low sample number that does not allow you to determine the distribution, and you do not know the distribution of the population, it is not valid to use a test that assumes normality, such as the Student's t test. Eg. Blood sugar levels in 6 diabetes susceptible mice treated with steroids in PBS, versus 6 diabetes susceptible mice treated with PBS alone. In this case, because it is not possible to determine the sample distribution with as few as 6 samples, a Mann-Whitney U test would be more appropriate, as this test makes no assumptions about the population distribution.
Q. Why not always use a Mann-Whitney test?
A. The t test is a more powerful test as it takes into consideration the population distribution from which you have taken a sample, and also allows us to measure the magnitude of the difference between groups. Therefore, if you know your data is normally distributed (or close to), then the t test is a better test to use.

Comparing three or more groups

We often find ourselves investigating more than two groups at a time. Eg Comparing T cell numbers in Strain C, B, and N mice. A very common mistake is to use a two group test on each combination within these three groups (eg. C vs B, C vs N, B vs N). The problem is if we accept a p value of 0.05 as the cut-off for significance in a two group test, this means that 5 times out of one hundred we will make the wrong conclusion (or 1 in 20). If we were to use a two group test twice on 1 sample of data, we double our chance of making the wrong conclusion for that data. If we were to use the test 20 times on 1 sample of data, we are highly likely to make a wrong conclusion. The ANOVA is a test that allows us to determine whether 3 or more groups are from the same of different populations. Importantly, similar to the t test, ANOVA assumes that the populations are normally distributed. If the distribution is not known, the Kruskal-Wallis test allows a comparison of 3 or more groups without making any assumptions about the distribution.

Yes or No (categorical) data

(eg. Proportion of mice that develop diabetes following steroid in PBS treatment versus proportion of mice that develop diabetes following PBS treatment alone). The results might be 10 diabetic, 90 non-diabetic; and 60 diabetic, 40 non-diabetic, respectively. This type of problem is examined using a Chi-square test. Note, if the number of mice tested was lower and the results were 1 diabetic, 9 non-diabetic; and 6 diabetic, 4 non-diabetic mice, respectively, the Chi-square test is not designed to handle numbers this low. The test that should be used instead is the Fisher's exact test. As a general rule, if the number in two or more categories is below 5, (eg. 1 diabetic mice in test group, 4 non-diabetic mice in control group) or the number in any category is equal to zero then a Fisher's exact test is more appropriate.

The Null hypothesis

When we carry out a statistics test, we are testing the hypothesis that the groups are the same (the null hypothesis). Typically, a p value of 0.05 or lower is accepted as the cut-off for rejecting the null hypothesis and accepting the alternate hypothesis (that the groups are different). It is not uncommon to hear or read "the groups were different (higher or lower), but not statistically different". If the p value is higher than 0.05, the null hypothesis should be accepted and the two groups should be considered to be the same. For example, it would not be valid to say that steroids cause a reduction in blood sugar levels in diabetes susceptible mice (eg. 11 3 versus 9 2) but that the results were not statistically significant. The only conclusion that could be made with this data is that steroids do not cause a reduction in blood sugar. Of course, further testing with larger numbers of samples might change this conclusion, but in the absence of that extra data, the conclusion that the groups are different can not be made.

Further reading

This page is simply intended to be a primer, and not in itself sufficient to give you a thorough understanding of the different tests and how they work. If you use a stats test in your results, or are trying to interpret another researcher's statistical results, I would recommend further reading. Some stats books are quite well written for non-mathematicians (eg. "Statistics without tears" by Derek Rowntree), and we are also fortunate enough to have Michael Bailey as a statistics consultant. Michael is happy to help with any questions or problems related to the use of stats.

E-mail Michael Bailey
E-mail Dale Godfrey

 

[an error occurred while processing this directive]