Since 2012, English schools have operated a version of Performance Related Pay.
During the operation of the system there have been statistical objections and practical criticisms. Teacher performance measurement tools like Surveys and Lesson Observations have been criticised as unreliable, and it is pupil data, particularly progress measures, which are now taken as a direct and fair way of comparing teachers and making judgements.
However, there is a fundamental difference between the way that pupils are measured, and the way that a profession is supposed to be measured. So even the use of pupil data to measure teacher performance can be potentially problematic.
Until the mid 1980s the English exam system was ‘norm referenced,’ with fixed proportions of each grade awarded each year. Although norm referencing is a very simple approach to awarding grades, it can lead to a range of problems. For example, it limits the availability of the highest grades and so it can entrench social privilege.
To avoid this kind of difficulty the English exam system moved to ‘criterion referencing’ in the mid 80s. In a criterion referenced exam, all the pupils who meet the standard (or ‘criterion’) receive the grade. The most apparent example of a criterion referenced test is the driving test. There are no quotas for how many people must pass or fail. Everyone who meets the standard passes.
In theory criterion referencing is obviously a fairer system, but it has proved very difficult to put fully into practice. Exam papers vary in relative ‘easiness’ or ‘hardness’ from year to year, so pupils cannot be graded just on exam marks. Some other ‘criterion’ is needed, in order to ensure that the same standard is applied to pupils over time.
Unable to find any ideal criterion the English Exam system has gone back to pupil norm referencing, albeit with a ‘twist.’ It is assumed that ‘broadly’ pupil ability and school standards do not change significantly from year to year, and so the DfE and Ofqual insist that ‘broadly’ the same proportion of grades should be awarded each year. The fact that it is ‘broadly’ the same, and not ‘exactly’ the same, means that the current system is a form of ‘soft’ norm referencing (rather than the ‘pure’ norm referencing which existed before the mid 1980s).
To ensure that exam results are ‘broadly’ similar to previous years, grade boundaries are moved, each year, by varying the number of marks needed to achieve a specific grade. When a lot of pupils get high marks, grade boundaries go up. In 2017 the introduction of harder GCSE exams led to pupil marks going down. So grade boundaries for pass marks were reduced to as low as 15%, in order to ensure that the awarding of grades was ‘broadly’ in line with previous years.
The fact that (soft) norm referenced examination grades are being used to measure teacher performance gives rise to a potential problem. There are supposed to be no quotas determining that only a proportion of teacher trainees can be qualified as teachers each year. There are supposed to be no quotas determining that only ‘broadly’ the same proportion of teachers can be good enough each year to achieve a pay rise. Yet, if exam grades are adjusted each year to ensure that only ‘broadly’ the same proportions of pupils can achieve ‘good’ grades each year, this introduces a potentially hidden cap on the proportion of teachers who will be able to show that they are good enough for a pay rise.
Even the fact that schools use ‘progress measures’ rather than ‘raw exam data’ does not resolve the problem. Progress data is generated by testing pupils early in their school careers, testing them again later and measuring the difference between the two grades. As we have already seen that ‘grading’ depends upon (soft) norm referencing this means that (soft) norm referencing is a core part of progress measures.
Individual pupil progress measures involve a further set of difficulties. They are calculated by ‘averaging’ the total set of pupil results and then each pupil is said to have made ‘positive’ or ‘negative’ progress, depending on whether the score is above or below the average of all pupils. The introduction of a concept of ‘positive progress’ based on averages means that only half of pupils (and so half of teachers) can ever achieve positive progress, as it is impossible for more than half of pupils to be above average.
We know that 92% of teachers have been judged by school leaders as demonstrating performance ‘good’ enough for a pay award. OFSTED have also independently judged that around 89% of schools are ‘good.’ These kinds of figures suggest that at least around 90% of pupils must be receiving a good education. But it cannot be the case that 90% of pupils are ‘above average’ with positive progress measures. It suggests (counter-intuitively) that it must be possible to be a good teacher even though pupil achievement is below average and ‘negative.’
At first glance it seems an obvious common sense principle that a teachers’ performance can be judged by pupil results. However, in England, it is proving to be anything but ‘obvious’. Worse still, as it becomes clear that Minority Ethnic teachers are far more likely to be refused a pay award than white teachers, there is now an emerging possibility that Performance Related Pay may actually be discriminatory.
Discrimination (if it is occurring) is a totally unacceptable outcome of any pay system. But sadly, perhaps it would not be an altogether surprising outcome for a system which is based upon a fundamentally illogical attempt to judge a criterion referenced profession with (soft) norm referenced data. When logic and reason are put to flight, isn’t it all too often prejudice and bias which steps into the gap?