|
|
by Sandra Loosemore November, 2003
The ISU's propaganda for the Code of Points claims that a major advantage of this system is that it will lead to "better consistency among judges" because the program components are based on "precise criteria". In other words, if the system is working the way the ISU says it should, we should expect to see the judges' marks for each component for a particular performance in close agreement, because their marks are now supposed to be based on "precise criteria" and have meaning in and of themselves instead of being placeholders for an ordinal, as in the 6.0 system. On the other hand, because the different program components each have their own "precise criteria", we would also expect to see significant variation in the marks for different components in the same performance, since skaters are unlikely to be equally good (or bad) at each component. For example, we might see skaters who have superior Skating Skills but a program with poor Choreography and simplistic Transitions.
Alas, even cursory examination of the results from the Grand Prix events where the Code of Points has been used shows that the expected judging pattern for the program components is not appearing. Instead, we are seeing comparatively little variation in the marks for the different program components given to each performance.
This article reports on a statistical analysis of the judging data from these competitions. The purpose of this analysis is to quantify and compare, for each performance,
If the values are highly correlated and concentrated around the same point in the range, that corresponds to a steep bell curve like the one on the left. If, on the other hand, the data points are more "spread out" over a larger range, that corresponds to a shallower bell curve, like the one on the right.
Mathematically, the amount of variation in a set of data is called the standard deviation. You don't really need to understand how standard deviation is computed, but a smaller standard deviation means that the data is more closely correlated than a larger standard deviation does. More precisely, if the standard deviation is 1, then about 68% of the samples lie within plus or minus 1 of the mean, and about 95% lie with plus or minus 2 of the mean.
In terms of how this applies to the judging of the program components, if the new judging system is working the way the ISU claims it should, we should see that the standard deviation of the judges' marks for the same component is small, corresponding to a steep bell curve like the one on the left. On the other hand, since the different program components are not supposed to be highly correlated to one another, the standard deviation between the marks given to different program components by an individual judge should be larger, like the spread-out bell curve on the right.
The purpose of the statistical analysis being reported here is to test the ISU's hypothesis, by computing standard deviations in the program components marks for each performance at the Grand Prix events. If you are not interested in the technical details of exactly what was computed and how, you can skip straight to the Commentary section at the end for the discussion of the results.
pdftotext,
a standard Linux utility. These files were then parsed and analyzed using
a small C program written for this purpose. All of the input files and
the source code for the program are available for download
here.
The program, munge.c, reads the program components marks
for each competitor from the protocol files. It computes the standard
deviation for each program component as marked by the different judges
on the panel, and also the standard deviation of each judge's marks
for all of the program components. The averages of these numbers for
each program segment (e.g., men's short program) are computed, and the
numbers for the competition as a whole. In addition, if the judges
are marking the program components marks the way the ISU asserts they
are, then for many competitors there will be less variance in the
marks for each program component than in the components marks from
each judge; so the program also counts for how many performances this
assertion is true.The program does not examine the correlation between the program components scores and the technical elements scores, or try to account for the random selection of judges and dropping of high and low marks which are also part of the Code of Points system.
| Skate America | ||||
| Competition segment | Deviation by component | Deviation by judge | Number of competitors | Number matching pattern |
| Dance CD | 0.654382 | 0.274488 | 10 | 0 |
| Dance OD | 0.622866 | 0.298865 | 10 | 0 |
| Dance FD | 0.612326 | 0.373781 | 10 | 0 |
| Ladies SP | 0.671344 | 0.337089 | 12 | 0 |
| Ladies FP | 0.595063 | 0.313165 | 12 | 0 |
| Men SP | 0.681366 | 0.336754 | 12 | 0 |
| Men FP | 0.619987 | 0.338100 | 12 | 0 |
| Pairs SP | 0.645048 | 0.311868 | 10 | 0 |
| Pairs FP | 0.470086 | 0.264175 | 10 | 0 |
| Overall | 0.61960 | 0.317684 | 98 | 0 |
| Skate Canada | ||||
| Competition segment | Deviation by component | Deviation by judge | Number of competitors | Number matching pattern |
| Dance CD | 0.540805 | 0.170442 | 11 | 0 |
| Dance OD | 0.437432 | 0.185610 | 11 | 0 |
| Dance FD | 0.487344 | 0.205204 | 11 | 0 |
| Ladies SP | 0.628916 | 0.255838 | 11 | 0 |
| Ladies FP | 0.478685 | 0.240390 | 11 | 0 |
| Men SP | 0.578165 | 0.277124 | 11 | 0 |
| Men FP | 0.509652 | 0.239924 | 11 | 0 |
| Pairs SP | 0.632615 | 0.263447 | 10 | 0 |
| Pairs FP | 0.461133 | 0.212065 | 10 | 0 |
| Overall | 0.527310 | 0.227577 | 97 | 0 |
| Cup of China | ||||
| Competition segment | Deviation by component | Deviation by judge | Number of competitors | Number matching pattern |
| Dance CD | 0.578955 | 0.222216 | 12 | 0 |
| Dance OD | 0.494624 | 0.253337 | 12 | 0 |
| Dance FD | 0.460597 | 0.259188 | 12 | 0 |
| Ladies SP | 0.692015 | 0.325162 | 11 | 0 |
| Ladies FP | 0.812866 | 0.370822 | 11 | 0 |
| Men SP | 0.582009 | 0.291693 | 11 | 0 |
| Men FP | 0.582778 | 0.265336 | 11 | 0 |
| Pairs SP | 0.562980 | 0.266084 | 10 | 0 |
| Pairs FP | 0.440882 | 0.230788 | 10 | 0 |
| Overall | 0.578110 | 0.275687 | 100 | 0 |
| Trophee Lalique | ||||
| Competition segment | Deviation by component | Deviation by judge | Number of competitors | Number matching pattern |
| Dance CD | 0.541085 | 0.198480 | 11 | 0 |
| Dance OD | 0.563852 | 0.251584 | 11 | 0 |
| Dance FD | 0.443173 | 0.197124 | 11 | 0 |
| Ladies SP | 0.570148 | 0.301421 | 10 | 0 |
| Ladies FP | 0.461824 | 0.291909 | 10 | 0 |
| Men SP | 0.619719 | 0.336198 | 12 | 0 |
| Men FP | 0.494392 | 0.287391 | 11 | 0 |
| Pairs SP | 0.599663 | 0.246036 | 8 | 0 |
| Pairs FP | 0.475387 | 0.227142 | 8 | 0 |
| Overall | 0.530177 | 0.261233 | 92 | 0 |
| Cup of Russia | ||||
| Competition segment | Deviation by component | Deviation by judge | Number of competitors | Number matching pattern |
| Dance CD | 0.602516 | 0.198892 | 12 | 0 |
| Dance OD | 0.458745 | 0.224778 | 12 | 0 |
| Dance FD | 0.438992 | 0.303411 | 12 | 1 |
| Ladies SP | 0.628918 | 0.265978 | 12 | 0 |
| Ladies FP | 0.475465 | 0.254350 | 12 | 0 |
| Men SP | 0.607233 | 0.261237 | 11 | 0 |
| Men FP | 0.577429 | 0.306245 | 11 | 0 |
| Pairs SP | 0.580943 | 0.265847 | 10 | 0 |
| Pairs FP | 0.504465 | 0.267054 | 9 | 0 |
| Overall | 0.537884 | 0.260130 | 101 | 1 |
| NHK Trophy | ||||
| Competition segment | Deviation by component | Deviation by judge | Number of competitors | Number matching pattern |
| Dance CD | 0.620961 | 0.210673 | 12 | 0 |
| Dance OD | 0.540816 | 0.263135 | 12 | 0 |
| Dance FD | 0.430050 | 0.250409 | 12 | 0 |
| Ladies SP | 0.751085 | 0.262021 | 10 | 0 |
| Ladies FP | 0.541171 | 0.278064 | 10 | 0 |
| Men SP | 0.682124 | 0.295708 | 11 | 0 |
| Men FP | 0.607767 | 0.324541 | 11 | 0 |
| Pairs SP | 0.566031 | 0.225213 | 10 | 0 |
| Pairs FP | 0.408313 | 0.199189 | 10 | 0 |
| Overall | 0.568413 | 0.256716 | 98 | 0 |
| Grand Prix Final | ||||
| Competition segment | Deviation by component | Deviation by judge | Number of competitors | Number matching pattern |
| Dance OD | 0.461889 | 0.266204 | 6 | 0 |
| Dance FD | 0.437408 | 0.245006 | 6 | 0 |
| Ladies SP | 0.498503 | 0.255780 | 6 | 0 |
| Ladies FP | 0.553278 | 0.294925 | 6 | 0 |
| Men SP | 0.462366 | 0.283801 | 5 | 0 |
| Men FP | 0.471982 | 0.291557 | 5 | 0 |
| Pairs SP | 0.542444 | 0.217516 | 6 | 0 |
| Pairs FP | 0.489598 | 0.248767 | 6 | 0 |
| Overall | 0.490662 | 0.261869 | 46 | 0 |
| All Competitions Combined | ||||
| Competition segment | Deviation by component | Deviation by judge | Number of competitors | Number matching pattern |
| Overall | 0.555295 | 0.266278 | 632 | 1 |
In other words, instead of representing performance aspects that the judges can accurately and consistently differentiate and mark according to precise criteria, different judges' marks for the same component vary more than the marks from individual judges for components which are supposed to reflect completely different performance aspects. Not only that, but this counter-intuitive pattern holds for all but one performance out of the hundreds which have been evaluated at these competitions.
What does this say about the Code of Points judging system, and the way the judges are applying it?
A statistical analysis cannot pinpoint why the judges are failing to distinguish between the different program components criteria. Some possibilities are:
| SkateWeb |
Home |
© 1994-2010 SkateWeb |