I am going to commence by providing you with an explanation of statistical significance. In the first year of our Psychology degree, we were all taught how to carry out significance tests. We were taught that significance tests were a necessity when testing a research hypothesis. The goal of a hypothesis test is to establish whether the results you have obtained from your study are likely to have occurred by chance or not (Gravetter & Forzano, 2009). The smaller the level of significance the less likely the results occurred by chance. You can conclude that your results are significant, and your results were unlikely to have been obtained by chance when you satisfy the criterion established by the alpha level. The alpha level defines how likely your results were obtained by chance alone. If you set your alpha level at .01, and your results satisfy this criterion there is less than one per cent probability that your results have occurred by chance alone. As I am sure you will have experienced, significance levels are reported as p values in literature.
Despite the emphasis on statistical significance testing in psychology, its importance is in dispute. F. L. Schmidt (1996) reported that the reliance of significance testing is harnessing our development of cumulative knowledge. Schmidt (1996) proceeded by suggesting that a reform of teaching and practice was necessary to ensure that researchers learn that “the benefits they believe flow from the use of significance testing are illusory”. He proposed that teachers ‘revamp’ their courses to allow students to understand that reliance on statistical significance harnesses the growth of cumulative research knowledge as aforementioned, that the benefits believed to flow from statistical testing do not in fact exist; and that significance methods must be replaced with point estimates and confidence intervals. He went on to conclude that a reform is essential for the future progress of cumulative knowledge in psychological research.
D. T. Lykken’s research (1968) illustrates that the importance of statistical significance has long been in dispute, the report he published in 1968 propounded that statistical significance is possibly the least important aspect of a good experiment. Lykken (1968) also proposed that a “significance test is never a sufficient condition for claiming that (1) a theory has been successfully corroborated, (2) a meaningful empirical fact has been established, or (3) an experimental report ought to be published.”
F. L. Schmidt (1996) strongly opposed the use of significance tests claiming that there were “severe deficiencies” associated with its use. Significance tests can always lead to an error as the process is inferential. It is possible that the process may be confounded by sampling error or even just chance; a sample will always provide you with incomplete information about a population.
Some samples are not good representatives of the population and can provide misleading information.
Two types of errors can be made when hypothesis testing; type one and type two errors. Type one errors occur when a researcher finds evidence for a significant result, despite there being no effect in the population. The error occurs because the researcher has, by chance, selected an extreme example that appears to show the existence of an effect when there is none (Gravetter & Forzano, 2009). Conversely, a type two error occurs when sample data does not show evidence of a significant result when in fact there a real effect does exist in the population (Gravetter & Forzano, 2009). This often occurs when the effect is so small that it does not show in the sample.
G. Krueger et al. (2000) encountered problems associated with the type two errors when studying psoriasis treatments.
Whilst carrying out their study on psoriasis treatments G. Krueger et al. (2000) encountered difficulties when attempting to define ‘clinically significant improvement’ for patients with psoriasis. They found that setting an ‘arbitrarily high criterion of clinical efficacy’ for new psoriasis treatments could possibly limit the development of approval of useful treatments. They believed that in order to maximise the availability of useful psoriasis treatments, psoriasis treatments should be approved when they have been shown to produce a statistically significant level of improvement in well-designed clinical trials.
In their study K. H. Todd et al. (1996) attempted to determine the amount of change in pain severity, as measured by a visual analogue scale that constitutes a minimum clinically significant difference. They found that the minimum clinically significant change in patient pain severity measured with a 100-mm visual analogue scale was 13 mm. Studies of pain experience that report less than a 13-mm change in pain severity, although statistically significant, may have no clinical importance.
K. H. Todd et al. (1996) proposed that the presence of a statistically significant result does not necessarily imply that the results are large enough for practical use. Whereas, another aforementioned study carried out by G. Krueger et al. (2000) provide us with a counter argument. One of the main problems that G. Krueger et al. (2000) faced in their study was that setting an arbitrarily high criterion of clinical efficacy for new psoriasis treatments will likely limit the development of approval of useful treatments. For G. Krueger et al. (2000) satisfying a significance test alone was enough for the approval of useful psoriasis treatments.
To conclude, although we have been taught that the inclusion of a significance test in reporting research is fundamental, there are vast amounts of research stating that perhaps it is not. F. L. Schmidt (1996) reported that a teaching reform is necessary for researchers and students to be taught about the severe deficiencies associated with significance testing.
Even when you have obtained a significant result by satisfying the criterion set by the alpha level, your results may have no practical use or clinical significance (Todd et al., 1996). So, why is there so much emphasis placed upon significance testing in psychology? Most of the research I have looked at (Lykken, 1968) argues that significance testing is the least important aspect of a good experiment.
Closer inspection has uncovered many truths about significance testing, causing me to doubt its importance. I now feel sick about how much time I spent in year one on significance testing.
Cheers for reading.
Schmidt, Frank L. (1996)
Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers.
Todd et al. (1996)
Clinical Significance of Reported Changes in Pain Severity
Gerald G. Krueger et al. (2000)
Two considerations for patients with psoriasis and their clinicians: What defines mild, moderate, and severe psoriasis? What constitutes a clinically significant improvement when treating psoriasis?
Lykken, David T. (1968)
Statistical Significance in Psychological Research