|
[an error occurred while processing this directive]
[an error occurred while processing this directive]
|
 |
Statistics Primer
Some basic statistics guidelines
Introduction
Surprisingly, although most medical research scientists use statistics
to support their investigations, many do not know which tests are appropriate
under which conditions. This is potentially a big problem - use of the
wrong test can lead to the wrong conclusion (ie. Two groups are judged
to be statistically different (p 0.05), when the correct test would
have said they are not). Although there is no need for us all to drop
everything and do a statistics course, there are some basic guidelines
that we should all follow if we are to use stats to support our data.
A sure fire way of getting tripped up in a research presentation is to
present data with stats when you have used an inappropriate test.
Different tests commonly used in medical research
There are 6 different tests that are commonly used in
our type of research, depending on the experimental situation.
- Student's t test ‚ for comparison of 2 groups of data that are normally
distributed.
- Mann-Whitney U test ‚ for comparison of 2 groups of data for which
the distribution is not known.
- ANOVA (analysis of variance) ‚ for comparison of 3
or more groups of data that are normally distributed.
- Kruskal-Wallis ‚ for comparison of 3 or more groups of data for which
the distribution is not known.
- Chi-square test ‚ for categorical data.
- Fisher's exact test ‚ for categorical data with a lower number of
samples.
Comparing two groups
Note that the way the data is distributed (ie. Normal (bell shaped curve),
or not known) is the most important factor in determining the appropriate
test. This is because some tests make assumptions about the population
distribution based on the sample taken (the population represents the
body of data from which you take a sample). If the population from which
the sample was taken is not normally distributed, the assumptions made
by that test regarding the population are inaccurate, and may even be
way off. Many populations are likely to be normally distributed, (eg.
Pulse rates of Honours Students), but some populations are not (eg. Number
of days after deadline for Honours thesis submission). Most importantly,
if you have a low sample number that does not allow you to determine the
distribution, and you do not know the distribution of the population,
it is not valid to use a test that assumes normality, such as the Student's
t test. Eg. Blood sugar levels in 6 diabetes susceptible mice treated
with steroids in PBS, versus 6 diabetes susceptible mice treated with
PBS alone. In this case, because it is not possible to determine the sample
distribution with as few as 6 samples, a Mann-Whitney U test would be
more appropriate, as this test makes no assumptions about the population
distribution.
Q. Why not always use a Mann-Whitney test?
A. The t test is a more powerful test as it takes into consideration the
population distribution from which you have taken a sample, and also allows
us to measure the magnitude of the difference between groups. Therefore,
if you know your data is normally distributed (or close to), then the
t test is a better test to use.
Comparing three or more groups
We often find ourselves investigating more than two groups at a time.
Eg Comparing T cell numbers in Strain C, B, and N mice. A very common
mistake is to use a two group test on each combination within these three
groups (eg. C vs B, C vs N, B vs N). The problem is ‚ if we accept a p
value of 0.05 as the cut-off for significance in a two group test, this
means that 5 times out of one hundred we will make the wrong conclusion
(or 1 in 20). If we were to use a two group test twice on 1 sample of
data, we double our chance of making the wrong conclusion for that data.
If we were to use the test 20 times on 1 sample of data, we are highly
likely to make a wrong conclusion. The ANOVA is a test that allows us
to determine whether 3 or more groups are from the same of different populations.
Importantly, similar to the t test, ANOVA assumes that the populations
are normally distributed. If the distribution is not known, the Kruskal-Wallis
test allows a comparison of 3 or more groups without making any assumptions
about the distribution.
Yes or No (categorical) data
(eg. Proportion of mice that develop diabetes following steroid in PBS
treatment versus proportion of mice that develop diabetes following PBS
treatment alone). The results might be 10 diabetic, 90 non-diabetic; and
60 diabetic, 40 non-diabetic, respectively. This type of problem is examined
using a Chi-square test. Note, if the number of mice tested was lower
and the results were 1 diabetic, 9 non-diabetic; and 6 diabetic, 4 non-diabetic
mice, respectively, the Chi-square test is not designed to handle numbers
this low. The test that should be used instead is the Fisher's exact test.
As a general rule, if the number in two or more categories is below 5,
(eg. 1 diabetic mice in test group, 4 non-diabetic mice in control group)
or the number in any category is equal to zero then a Fisher's exact test
is more appropriate.
The Null hypothesis
When we carry out a statistics test, we are testing the
hypothesis that the groups are the same (the null hypothesis).
Typically, a p value of 0.05 or lower is accepted as the
cut-off for rejecting the null hypothesis and accepting
the alternate hypothesis (that the groups are different).
It is not uncommon to hear or read "the groups were different
(higher or lower), but not statistically different". If
the p value is higher than 0.05, the null hypothesis should
be accepted and the two groups should be considered to
be the same. For example, it would not be valid to say
that steroids cause a reduction in blood sugar levels
in diabetes susceptible mice (eg. 11± 3 versus 9± 2) but
that the results were not statistically significant. The
only conclusion that could be made with this data is that
steroids do not cause a reduction in blood sugar. Of course,
further testing with larger numbers of samples might change
this conclusion, but in the absence of that extra data,
the conclusion that the groups are different can not be
made.
Further reading
This page is simply intended to be a primer, and not in itself sufficient
to give you a thorough understanding of the different tests and how they
work. If you use a stats test in your results, or are trying to interpret
another researcher's statistical results, I would recommend further reading.
Some stats books are quite well written for non-mathematicians (eg. "Statistics
without tears" by Derek Rowntree), and we are also fortunate enough to
have Michael Bailey as a statistics consultant. Michael is happy to help
with any questions or problems related to the use of stats.
|