Type I and Type II Errors

After much deliberation in regards to which statistics topic should be privileged by being chosen as my first blog of 2012, I decided to opt for Type I and Type II errors.  This topic struck me as the most appropriate after lingering in my thought process after Monday’s research skills lecture. 

I was happy to be re-introduced to this topic, as my abstract understanding of the two terms attained from year one seems to have somewhat diminished.

I believe that the structure of this blog should reflect the voyage of discovery I embarked upon to write this blog; therefore the first element will be comprised of the basic definitions.

Field (2009) a trustworthy reference, explained that a ‘Type I error’ occurs when we believe that there is a genuine effect in our population, when in fact there is not. 

Conversely, a ‘Type II error’ occurs when we believe that there is no effect in the population, when in fact there is (Field, 2009). It is important to note that when talking about a ‘genuine effect’ we are referring to a significant result. 

It is quite clear from these lucid definitions that the two forms of error encountered in hypothesis testing only share one similarity; that they both could not be any more opposite from one another.

Having had a brief re-cap of the two terms and their definitions, it is now time to delve a little bit deeper…

It is important to note that there is always a chance that hypothesis testing can result in some error, because a hypothesis test is an inferential process “using limited information to reach a general conclusion” (Gravetter and Forzano, 2009). 

Committing Type I and Type II errors is not necessarily a reflection on the ability of the researcher; although Type I and Type II errors are mistakes, they are not foolish or careless mistakes in the sense that the researcher is overlooking something that should be perfectly obvious.  In fact, these errors are the direct result of a careful evaluation of the research results.  The problem is that the results obtained from a sample are misleading; giving the researcher no other option than to reach an incorrect conclusion (Gravetter & Forzano, 2009).

I believe it is necessary to reiterate a point just mentioned above; the results obtained from a sample can be in some instances misleading.  Samples can vary from being near identical to the populations they represent to being extremely different; this is just life.  If a researcher is by chance unfortunate enough to select a sample that is extremely different from its respective population, the data it produces may falsely appear to show an effect when there is none, or visa versa.  

For this reason Rothman (2010) argues against the use of ‘P-values’, indicating that the P-value has produced countless misinterpretations of data that are often amusing for their folly, but also hair-raising in view of the serious consequences.

 

It is important to be aware of Type I and Type II errors because firstly, it is easy to mistake them; when your data appears to show an effect you are more than likely going to assume that an effect does exist; and secondly, as Rothman (2010) indicated there are serious consequences for mistaking false data.  Adding, scientists who wish to avoid type I or type II errors at all costs may have chosen the wrong profession, because making and correcting mistakes are inherent to science (Rothman, 2010).

Type I errors are possibly the most dangerous out of the two, with seriously damaging consequences.

There are serious consequences to making a Type I error; for example, consider that you are jubilant after finding a crucially important, ground-breaking effect, but unbeknown to you, your effect is false and you have committed a Type I error.  Naturally, you then go prancing off to your superiors and inform them of just how amazing and important your finding is.  Sometime later, you manage to publish your report.  But you did not actually find an effect, in reality the sample you had chosen produced results that appeared to show an effect when they did not.  You had by total chance chosen an extreme sample that did not in one bit represent the population it had been selected from.  What are the implications to this?  Well, firstly you have published a false report providing false information to scientific literature.  Lastly and possibly even most importantly, other researchers may attempt to build theories or develop other experiments based on your false results; wasting a lot of time and money.  Eventually the credibility of your paper will disappear as replications uncover that the results reported were in fact false. 

When talking about ‘an extreme sample’ we are referring to a ‘low probability sample’, a self-explanatory term for a sample that is very rare to obtain; which would produce scores in the thin tail ends in the distribution of sample means, in contrast to a ‘high probability sample’; which would produce the large middle section in the distribution of sample means.  The boundary between high and low probability samples is determined by selecting a specific probability value, an alpha level (usually set at .05); a small probability that is used to identify the low-probability samples.  Alpha levels are linked very closely to Type I errors, being the probability that the test will lead to a Type I error.

 

To re-cap, each time you do a hypothesis test, you select an alpha level that determines the risk of a Type I error.  For every hypothesis test there is 5% risk (because alpha level is usually set at .05), the more tests you do the more risk there is of a Type I error; which is why experimenters distinguish between testwise alpha and experimentwise alpha.  Experimentwise alpha is the risk of a Type I error accumulated from all of the separate tests in the experiment; as the number of separate tests increases so does the experimentwise alpha level (Gravetter & Wallnau, 2009).  For an experiment that compares more than two or three treatments it is important to avoid an inflated experimentwise alpha level as there is a much larger risk of a Type I error.  This is why ANOVA’s are preferred in the statistical analysis of experiments involving more than two treatments; ANOVA’s compare all of the treatments simultaneously in the same hypothesis test, avoiding the problem of an inflated experimentwise alpha level.

 

Having explored the concept of Type I errors, it is necessary to consider Type II errors. When a type II error occurs, a researcher may choose to disregard their research project under the assumption that there is no real effect in the data, or the effect is too small to be of any significance; wasting the time spent conducting the study or even wasting extra time on a replication of the study in an attempt to gain a larger, more significant effect. 

G. Krueger et al. (2000) encountered problems associated with the type two errors when studying psoriasis treatments.  Whilst carrying out their study on psoriasis treatments G. Krueger et al. (2000) encountered difficulties when attempting to define ‘clinically significant improvement’ for patients with psoriasis.  They found that setting an ‘arbitrarily high criterion of clinical efficacy’ for new psoriasis treatments could possibly limit the development of approval of useful treatments.  They believed that in order to maximise the availability of useful psoriasis treatments, psoriasis treatments should be approved when they have been shown to produce a statistically significant level of improvement in well-designed clinical trials.

As aforementioned Type I errors are possibly more damaging, mainly because falsely reporting something is not as serious as not reporting a true effect. 

Rothman (2010) proposed a controversial method of minimizing both Type I and II errors, suggesting “all that is needed is simply to abandon significance testing”.  However, there is a lot of support in favour of ‘abandoning’ significance testing; Schmidt (1996) strongly opposed the use of significance tests claiming that there were “severe deficiencies” associated with its use; as significance tests can always lead to an error as the process is inferential.  D. T. Lykken’s research (1968) illustrates that the importance of statistical significance has long been in dispute, the report he published in 1968 propounded that statistical significance is possibly the least important aspect of a good experiment. 

To conclude, Type I errors are easily mistaken for a real effect and can lead to serious consequences, such as reporting false-results; whereas Type II errors can waste a lot of time, somewhat harnessing research development.  Both types of error are not reflections on the quality of the researcher, but more simply that the results obtained from a sample can be misleading.  Giving rise to contention in regards to the importance of significance testing in science.  Lykken (1968), Schmidt (1996), and Rothman (2010) all provide convincing arguments as to why significance testing should be ‘abandoned’; with the main arguments pivoting around its ‘severe deficiencies’ and it being the ‘least important aspect of an experiment’ producing ‘countless misinterpretations of data’ that are sometimes amusing, subsequently followed by the severe consequences aforementioned.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s