Statistical Analysis of Program Components Marks

by Sandra Loosemore
November, 2003

Introduction

The ISU's new Code of Points judging system includes, in addition to marks for specific technical elements, five "program components" marks, which are intended to replace the presentation mark from the current system: Skating Skills, Transitions, Performance/Execution, Choreography and Interpretation.

The ISU's propaganda for the Code of Points claims that a major advantage of this system is that it will lead to "better consistency among judges" because the program components are based on "precise criteria". In other words, if the system is working the way the ISU says it should, we should expect to see the judges' marks for each component for a particular performance in close agreement, because their marks are now supposed to be based on "precise criteria" and have meaning in and of themselves instead of being placeholders for an ordinal, as in the 6.0 system. On the other hand, because the different program components each have their own "precise criteria", we would also expect to see significant variation in the marks for different components in the same performance, since skaters are unlikely to be equally good (or bad) at each component. For example, we might see skaters who have superior Skating Skills but a program with poor Choreography and simplistic Transitions.

Alas, even cursory examination of the results from the Grand Prix events where the Code of Points has been used shows that the expected judging pattern for the program components is not appearing. Instead, we are seeing comparatively little variation in the marks for the different program components given to each performance.

This article reports on a statistical analysis of the judging data from these competitions. The purpose of this analysis is to quantify and compare, for each performance,

If you are not mathematically inclined, you can skip straight to the Commentary section at the end for the discussion of the results and what they mean.

Statistical background

Most people are familiar with the concept of the "bell curve" which represents a normal statistical distribution. The idea is that most of the values being analyzed lie close to the mean or center point of the range, and fewer at the ends of the ranges.

If the values are highly correlated and concentrated around the same point in the range, that corresponds to a steep bell curve like the one on the left. If, on the other hand, the data points are more "spread out" over a larger range, that corresponds to a shallower bell curve, like the one on the right.

Mathematically, the amount of variation in a set of data is called the standard deviation. You don't really need to understand how standard deviation is computed, but a smaller standard deviation means that the data is more closely correlated than a larger standard deviation does. More precisely, if the standard deviation is 1, then about 68% of the samples lie within plus or minus 1 of the mean, and about 95% lie with plus or minus 2 of the mean.

In terms of how this applies to the judging of the program components, if the new judging system is working the way the ISU claims it should, we should see that the standard deviation of the judges' marks for the same component is small, corresponding to a steep bell curve like the one on the left. On the other hand, since the different program components are not supposed to be highly correlated to one another, the standard deviation between the marks given to different program components by an individual judge should be larger, like the spread-out bell curve on the right.

The purpose of the statistical analysis being reported here is to test the ISU's hypothesis, by computing standard deviations in the program components marks for each performance at the Grand Prix events. If you are not interested in the technical details of exactly what was computed and how, you can skip straight to the Commentary section at the end for the discussion of the results.

Methodology

The PDF protocols for the six regular Grand Prix events and the Grand Prix Final were downloaded from the ISU web site and converted to plain text using pdftotext, a standard Linux utility. These files were then parsed and analyzed using a small C program written for this purpose. All of the input files and the source code for the program are available for download here. The program, munge.c, reads the program components marks for each competitor from the protocol files. It computes the standard deviation for each program component as marked by the different judges on the panel, and also the standard deviation of each judge's marks for all of the program components. The averages of these numbers for each program segment (e.g., men's short program) are computed, and the numbers for the competition as a whole. In addition, if the judges are marking the program components marks the way the ISU asserts they are, then for many competitors there will be less variance in the marks for each program component than in the components marks from each judge; so the program also counts for how many performances this assertion is true.

The program does not examine the correlation between the program components scores and the technical elements scores, or try to account for the random selection of judges and dropping of high and low marks which are also part of the Code of Points system.

Results

The following tables summarize the results of the analysis on the data from the Grand Prix competitions:

Skate America
Competition segment Deviation by component Deviation by judge Number of competitors Number matching pattern
Dance CD 0.654382 0.274488 10 0
Dance OD 0.622866 0.298865 10 0
Dance FD 0.612326 0.373781 10 0
Ladies SP 0.671344 0.337089 12 0
Ladies FP 0.595063 0.313165 12 0
Men SP 0.681366 0.336754 12 0
Men FP 0.619987 0.338100 12 0
Pairs SP 0.645048 0.311868 10 0
Pairs FP 0.470086 0.264175 10 0
Overall 0.61960 0.317684 98 0

Skate Canada
Competition segment Deviation by component Deviation by judge Number of competitors Number matching pattern
Dance CD 0.540805 0.170442 11 0
Dance OD 0.437432 0.185610 11 0
Dance FD 0.487344 0.205204 11 0
Ladies SP 0.628916 0.255838 11 0
Ladies FP 0.478685 0.240390 11 0
Men SP 0.578165 0.277124 11 0
Men FP 0.509652 0.239924 11 0
Pairs SP 0.632615 0.263447 10 0
Pairs FP 0.461133 0.212065 10 0
Overall 0.527310 0.227577 97 0

Cup of China
Competition segment Deviation by component Deviation by judge Number of competitors Number matching pattern
Dance CD 0.578955 0.222216 12 0
Dance OD 0.494624 0.253337 12 0
Dance FD 0.460597 0.259188 12 0
Ladies SP 0.692015 0.325162 11 0
Ladies FP 0.812866 0.370822 11 0
Men SP 0.582009 0.291693 11 0
Men FP 0.582778 0.265336 11 0
Pairs SP 0.562980 0.266084 10 0
Pairs FP 0.440882 0.230788 10 0
Overall 0.578110 0.275687 100 0

Trophee Lalique
Competition segment Deviation by component Deviation by judge Number of competitors Number matching pattern
Dance CD 0.541085 0.198480 11 0
Dance OD 0.563852 0.251584 11 0
Dance FD 0.443173 0.197124 11 0
Ladies SP 0.570148 0.301421 10 0
Ladies FP 0.461824 0.291909 10 0
Men SP 0.619719 0.336198 12 0
Men FP 0.494392 0.287391 11 0
Pairs SP 0.599663 0.246036 8 0
Pairs FP 0.475387 0.227142 8 0
Overall 0.530177 0.261233 92 0

Cup of Russia
Competition segment Deviation by component Deviation by judge Number of competitors Number matching pattern
Dance CD 0.602516 0.198892 12 0
Dance OD 0.458745 0.224778 12 0
Dance FD 0.438992 0.303411 12 1
Ladies SP 0.628918 0.265978 12 0
Ladies FP 0.475465 0.254350 12 0
Men SP 0.607233 0.261237 11 0
Men FP 0.577429 0.306245 11 0
Pairs SP 0.580943 0.265847 10 0
Pairs FP 0.504465 0.267054 9 0
Overall 0.537884 0.260130 101 1

NHK Trophy
Competition segment Deviation by component Deviation by judge Number of competitors Number matching pattern
Dance CD 0.620961 0.210673 12 0
Dance OD 0.540816 0.263135 12 0
Dance FD 0.430050 0.250409 12 0
Ladies SP 0.751085 0.262021 10 0
Ladies FP 0.541171 0.278064 10 0
Men SP 0.682124 0.295708 11 0
Men FP 0.607767 0.324541 11 0
Pairs SP 0.566031 0.225213 10 0
Pairs FP 0.408313 0.199189 10 0
Overall 0.568413 0.256716 98 0

Grand Prix Final
Competition segment Deviation by component Deviation by judge Number of competitors Number matching pattern
Dance OD 0.461889 0.266204 6 0
Dance FD 0.437408 0.245006 6 0
Ladies SP 0.498503 0.255780 6 0
Ladies FP 0.553278 0.294925 6 0
Men SP 0.462366 0.283801 5 0
Men FP 0.471982 0.291557 5 0
Pairs SP 0.542444 0.217516 6 0
Pairs FP 0.489598 0.248767 6 0
Overall 0.490662 0.261869 46 0

All Competitions Combined
Competition segment Deviation by component Deviation by judge Number of competitors Number matching pattern
Overall 0.555295 0.266278 632 1

Commentary

The results of the analysis show that the judges are not marking the program components in the way the ISU has asserted they would be judged. If the ISU's claim were correct, we would see that the standard deviation between different judges' marks for the same program component would generally be less than the standard deviation between the same judge's marks for different program components, and the exact opposite is true. In fact there is roughly twice as much variation between the different judges' marks for the same component than between their marks for different components.

In other words, instead of representing performance aspects that the judges can accurately and consistently differentiate and mark according to precise criteria, different judges' marks for the same component vary more than the marks from individual judges for components which are supposed to reflect completely different performance aspects. Not only that, but this counter-intuitive pattern holds for all but one performance out of the hundreds which have been evaluated at these competitions.

What does this say about the Code of Points judging system, and the way the judges are applying it?

SkateWeb
 Home © 1994-2010 SkateWeb