ERROR IN TESTING (AND SPORTS)

With the spring weather upon us (well not really, but its supposed to be) and the start of baseball season, hockey season gearing up for playoffs and Champions League soccer (yes, I tend to watch a lot of sports) I have seen the element of human error of the officials (which we in the testing business will call judges at this point) come into play, sometimes leading to outcome-deciding moments.  In the testing industry or educational and psychological measurement, we understand that no test is perfect and the element of measurement error comes into play ALWAYS!  However, we think proactively and endeavour to minimize the impact of measurement error through various methodologies that are available to us in the measurement field.  So why, then, does the world of professional sports with millions of dollars involved, fail to use the methodology at their disposal to to minimize human error in officiating?

We in the measurement field deal with very high-stakes situations, like testing for professional licensing and certification, and these are life-impacting decisions. We have to get the calls right, at least within an acceptable margin of error, say with 95% certainty on pass/fail decisions. We do this by building tests through specific procedures that ultimately enhance the validity of the score-based interpretations of test results.  We use objective-format items where there is no judgement involved; there is only one right answer, all others are incorrect.  When judgement is necessary, say on performance assessments, we develop scoring rubrics and keys, train the judges or raters to use them and then calibrate the judges scores to increase inter-rater reliability.  We also help panels of judges set performance standards through sound and defensible methods. We use the technology available to us to minimize the error, sometimes using computer-delivered and computer-scored tests, for example.

But in professional sports, the officials who one would think must also make the right calls, seem to get impactful calls wrong.  Surely, officials have been trained, calibrated to make the same calls on the same plays, enhancing the inter-rater reliability of their judgements.  But if you know anything about baseball, you know that one umpire’s strike zone may be completely different that another’s, or  having low inter-rater reliabiity.  But if a given umpire has a consistent strike zone throughout a given game, having high intra-rater reliability, is that ok?  Well, we see that many times, this is not even the case with erratic strike zones and different strike zones for different players and different strike zones for the same player in the same at-bat.  Recently, four soccer officials all did not see a clear off-side that led to a goal that had a huge impact on a Champions League game with millions of dollars at stake.  Their call or lack of had great inter-rater reliability but was not the correct call; their decision succombed to huge validity threats.  Surely, electronics and computers can make these calls with sufficient reliability and validity to appease fans.  So why don’t we rely on the available technology to get these impactful decsions right?

The answer I believe is very simple.  We watch and enjoy sports for the entertainment value, the human element, though a huge contributor to measurement error, is part of the lure of the spectacle of sports.  We may not like the decisions sometimes, and decisions are erroneus sometimes, but it is palatable because even with millions of dollars at stake, sport is just sport.  Let’s face it, though, the officials do get the calls correct the vast majority of times, perhaps not 95% of the time, but that level of error is acceptable in that domain.

Our domain as measurement professionals is quite different; we do our jobs methodically, systematically,  dipassionately seeking to minimize the error that could impact someone’s life in a negative way.  Perhaps that’s why I enjoy watching sports; error  is part of the game, and we accept that officals are going to make errors, even though it may cost our team. Sports and the human element of error in officiating fuels the passion I must most-often