Revolutionising Test Grading with Comparative Judgement

December, 2015

Revolutionising test grading with comparative judgement - Future Of Tech
EDUCATION A new model of exam grading has been developed by a group of scientists that could cause greater reliability and accuracy in grading.

GCSE & A-level results appeals cause stresses and financial losses for candidates, parents, schools and awarding bodies every year and the numbers are still increasing. As it stands an essay worth a C grade at GCSE could easily find itself in the B or D bracket with reported grading accuracy currently standing at an average of 60% for essay or portfolio type assessments.

A new model of exam grading has been developed by a group of scientists from technology company Digital Assess with input from senior academics at Cambridge University and Goldsmiths, University of London, and has demonstrated grading accuracy improvements from the reported current average of 60% to over 98%.   

Adaptive Comparative Judgement (ACJ) is based on the Law of Comparative Judgement that proves that people are better at making comparative, paired judgements than absolute ones.

How it works

By moving from the current system in which expert examiners mark every paper according to a mark scheme (an absolute judgement), research has shown how asking the same markers to compare one paper to another side-by-side and declare which is better, can produce far more consistent results. The approach can be repeated to rank every paper in the country fairly and accurately. 98% accuracy is possible with as few as eight rounds of comparisons, with the use of machine learning being explored to reduce the number of rounds even further.

Where it can be used

ACJ has demonstrated huge potential in use by awarding bodies and institutions worldwide. So far, pilots delivering significantly improved reliability have taken place in Australia, Sweden, Singapore and the US. The process means GCSE and A-Level appeal figures could be dramatically reduced, after all if your grade is generated via a collective consensus of expert assessors through ACJ, what is there to appeal against?

There are also many more possibilities for using ACJ across education. One recent project, Classical 100, which is a free online teaching resource provided by ABRSM, containing the top 100 recordings of classical music, plus information on the composers and stories behind the music was created using ACJ. The database was ranked by a panel of music experts using ACJ, which allowed the music to be judged and ordered into a definitive ranking according to its suitability for different classroom scenarios.

Another recent case study saw The University of Edinburgh using ACJ to develop a crowdsourcing-style system of peer review, changing the way formative assessment is delivered. This project used ACJ to empower learners to critique their peers’ work against assessment criteria, giving them deeper insight into what made a good piece of work and capturing feedback across the cohort, enhancing their learning experience and giving them more opportunities to improve and develop.

Using the ACJ method correctly could allow every exam paper in the country to be compared against each other for a definitive ranking. This would give a direct way of comparing performance across different regions, types of school, or any other variable, giving a way of identifying the influence of those different factors on exam grades.

In the USA, a similar system exists where each school is ranked and the ranking of the school is factored into consideration when grading pupils– if a school is low performing and a student still achieves a high grade, that grade counts for more than it would in a high-ranking school. These types of measures could be implemented to combat education inequality.

Ultimately, greater reliability and accuracy in grading could be the answer to many of the education system’s current challenges.

Dan Sandhu, CEO

Link to original article can be found here.