The Joint Information Systems Committee (JISC) recently published a study into the role of technology in developing student employabilty, entitled Technology for Employabilty – HE Case Studies.
Amongst the case studies published was the University of Edinburgh – Developing 21st Century Career-Ready Graduates. This case study follows the development of various initiatives at the Edinburgh College of Art pre-merger and the subsequent cross-University developments post-merger.
In 2011 the University introduced the Edinburgh Award to support students in their wider learning while at the University. The approach enables students to control and manage their own development and confidently articulate the learning acquired and progress made through drawing on their curricula, co- and extra-curricular activities. A further dimension was introduced to the award and students now assess and provide peer feedback to each other anonymously online, using an Adaptive Comparative Judgement (ACJ) approach.
A new innovation was introduced in 2015 to further develop career readiness and to introduce self-reflection and assessment for learning approaches. It is a credit-bearing online undergraduate “self-defined learning experience” module called SLICCs (Student Led Individually Created Courses), underpinned by using e-portfolios to evidence the learning. For the summative assessment submission, students critically select various parts of all their formative reflections, documentation and digital artefacts, and bring these together as a formal submission in their webfolio along with a critically self-reflective and evaluative report. Students are also required to formatively self-grade the final submission. The summative assessment is then conducted by their tutors.
ECA designed and developed a bespoke Learning Management System to facilitate both the design of projects, managing the feedback, responses and actions generated by both students and staff and both the formative and summative assessment process and allocation of grades. At the end of each project students can compare ‘side-byside’ their graded self-evaluation and the staff assessment along with staff feedback, their own reflections on the feedback given and their intended actions as a consequence. The Edinburgh Award uses Adaptive Comparative Judgement software by Digital Assess to enable students to anonymously provide peer-feedback at the first draft stage and finally to summatively assess each other for the award itself.
Briana Pegado final year student and 2015 Students’ Association President, University of Edinburgh.
Edinburgh College of Art participated for the first time in the NSS after merger and achieved among the highest ratings for the questions around assessment and feedback in the University and in the sector for art and design. The process of requiring students to formatively self-reflect and self-grade throughout the programme has had a significant impact on improving how they articulate what they have achieved, including their acquisition of employability skills, particularly as in order to achieve some of the learning outcomes, students have to be able to demonstrate and articulate how they have achieved them. The use of the portfolio is fundamental to evidencing the quality and standards achieved in art and design disciplines. Increasingly this is being supplemented, and in some cases replaced, by online versions or e-portfolios.
Full case study can be found here.
Free online resource aims to ignite pupils’ enthusiasm for classical music, explains Dan Sandhu, Chief Executive of Digital Assess.
This month teachers up and down the country were given access to a new free online resource, Classical 100, which aims to break down barriers and ignite pupils’ enthusiasm for classical music.
Complementing existing teaching resources, Classical 100 has been developed by the Associated Board of the Royal Schools of Music (ABRSM) in partnership with Classic FM and Decca Classics and with the support of the Department for Education and Digital Assess providing the technology behind the project.
What is it?
Classical 100 is built around 100 recordings of classical music pieces that teachers can draw upon in lessons, school assemblies and other activities. Alongside a recording of each of the works taken from Decca’s world-renowned catalogue, there is information about the composer and the story behind the music.The online resource will be continually enhanced throughout the academic year with a range of downloadable materials created and published from ABRSM’s network of primary school experts.
The listening resource is available free of charge via a website for teachers, the pieces of music have been arranged to allow teachers to tailor lessons using music, for example using the mood changing scale in a storytelling lesson.
The Classical 100 is not intended to be exhaustive or prescriptive. The pieces are intended to serve as an introduction, which will inspire children to explore classical music further.
Why the need?
The project aims to open up classical music to children from an early age. Allowing children to hear and explore the music in an imaginative way will hopefully inspire a new generation to a lifetime of listening, performing, making and exploring the world of music.
The Schools Minister Nick Gibb MP has supported the project as part of the Department for Education’s wider aims to improve music education in schools.
Music shouldn’t only be accessible to those who can afford it or to those whose parents play instruments or listen to it at home. ABRSM highlighted in its 2014 survey Making Music that children from less well-off backgrounds are less likely to play a musical instrument and less likely to have had music lessons.
How can teachers use it?
Classical 100 can also be used to meet the National Curriculum’s Key Stage 1 criteria of ‘listening to, reviewing and evaluating music across a range of historical periods, genres, styles and traditions, including the works of the great composers and musicians’.
Alison Walker, one of the music teachers invited as a judge to select the pieces of music explained how it can be used in the classroom, “The classification of pieces by levels of energy (represented by a mood changing scale), provides teachers with the tools to select a varied and interesting playlist to suit their particular requirements at any one time.”
Alison was one of the teachers who attended the launch at St Charles RC Primary School, Ladbroke Grove. She went on to say: “Similarly, they can delve deeper into the 100 pieces and choose a piece of programmatic music with storytelling links to a particular topic, select a piece of music which is contemporary to a particular event in history or discover a composer who comes from a country whose traditions and cultures they are studying. The opportunities for further exploration from this starting point of 100 pieces are thrillingly endless.”
Teachers can use the flexible resource to raise the energy levels by selecting Bernstein’s ‘Mambo’ from West Side Story, or encourage a moment of quiet reflection with Beethoven’s ‘Moonlight’ Sonata. If a class were, for example, exploring ‘story-telling’, the teacher could draw together multiple resources around Prokofiev’s Peter and the Wolf.
As another example, if a teacher wanted to exemplify the Romantic period, it would lead them to a list including Tchaikovsky’s 1812 Overture. If they were exploring choral music, they could discover Handel’s ‘Hallelujah’ Chorus from Messiah.
The technology behind the project
Classical 100 was developed by industry experts with a wealth of primary teaching knowledge and professional experience compiling syllabuses and other education materials and has been rigorously tested by a broad community of teachers, music services, and educational musical experts. The pieces of music were selected using Adaptive Comparative Judgement (ACJ) technology working in partnership with Digital Assess. The use of ACJ allowed the music to be judged and ranked according to its suitability for classroom scenarios using an iterative and adaptive algorithm.
Alison Walker explained the process, “The filtering process made use of the ACJ method, an approach that we can see huge potential for wider use in schools and universities for grading exam papers with greater accuracy.”
Using Digital Assess’ bespoke software judges were asked to make decisions on the pieces of music allowing them to be ranked. Alison went on to say, “Over the course of three evenings I made a total of 177 judgements in three separate categories, and within each category selecting one piece from a choice of two which I felt most agreed with one of the corresponding statements: ‘which piece has the strongest sense of story?’, ‘which piece is the most energetic’ or ‘which piece most suggests dance and movement?’. The ‘winning’ piece, together with those chosen by the other judges, formed a hierarchy of pieces in each category, providing the Classical 100 product design team with a range of answers to map the results and form a framework of classification.”
The use of ACJ involved the whole team of judges sharing their judgements on the pieces of music against set criteria; this collaborative process improves the inter-rater consistency between the judges, as well as significantly improving the reliability of the assessment process and is based on the Law of Comparative Judgement that states that people are better at making comparative judgements rather than absolute ones. This means that the final 100 can be relied upon by teachers as definitive, because it is the most accurate possible outcome of the combined expertise of the judges.
Schools can gain full, unlimited access to Classical 100 by registering at www.abrsm.org/classical100
Click here for link to the Education Technology’s original article.
Photo credits – Tom Weller
“ABRSM launches ‘Classical 100’, a free online resource bringing classical music to primary schools.”
Click here for link to the Albion Media’s original article
This summer Digital Assess undertook a study of Comparative Judgement in order to investigate a number of niggly questions, partly in response to a number of informal studies which have at least challenged the ability of Adaptive Comparative Judgement (ACJ) to produce the level of consistency suggested by the studies Digital Assess and their customers have repeatedly achieved over recent years. To this end, we wanted to compare the ranking produced by ACJ in comparison to non-adaptive Comparative Judgement (CJ) and we wanted to do it using real judges, not abstract computer simulations. We also wanted to look at another suggestion that random judgements have the power to distort the results of ACJ and since they are undetected by the ACJ system can potentially undermine its effectiveness. We wanted to test the plausibility and accuracy of such an argument. The third and final question we wished to look at is what we really mean when we talk about the internal consistency measure and to explore the correlation between the estimated ranking and the true one in simulated ACJ sessions.
So, with our objectives clear and through the Summer break, the Digital Assess team ran a trial of non-adaptive CJ with teachers judging work from secondary Design and Technology students, 20 scripts in total, all taken from a previous ACJ session from which we have undertaken a number of studies and is the subject of the well known paper Kimbell, R., Wheeler, T., Stables, K., Shepard, T., Martin, F., Davies, D., Pollitt, A., Whitehouse, G. (2009). E-scape portfolio assessment phase 3 report. London: Goldsmiths, University of London. For the study we used 6 new (?) judges. Each judge was asked to do all possible pairs of comparisons between the scripts and in doing so completed a full CJ of that data. The overall ranking determined by all the judgements of all the teachers together was then to be used to determine an overall result.
Looking at our results and with regards to the issue of the reliability of the ACJ ranking our overall result was produced by comparing all possible Comparative Judgement for each of our judges. It is for this reason that we limited ourselves to a small number of scripts since these 20 scripts required a total of 1140 judgements in order to generate the overall ranking.
Once we had generated the overall ranking of the 6 judges based on all the possible judgement combinations, we then compared it to the ACJ one that was generated in the original study. We wanted to show that the substantially more efficient ACJ approach produced comparable results with considerably less effort. Our study got a more than satisfactory Spearman correlation coefficient of 0.91 when compared to the original study. We also looked at the correlation between the ranking of the individual judges and the new overall result. These are listed in the table below.
We see that the ranking of each judge is well correlated with the overall ranking produced by the full CJ session with the overall result almost completely inline.
Next up and with regards to the issue of random judgements, cited elsewhere as an example of where ACJ may fall down, we wanted to show that this just doesn’t hold up in the reality of actual CJ sessions. ACJ relies on judges being consistent with themselves and some of the studies that have brought ACJ into question have begun with the assumption that this may not be the case. We felt that it was important to demonstrate that even with judges who are not experienced as markers this was a reasonable assumption to make so in our study we investigated the self-consistency of each judge. A judge being consistent with himself means that if he deems that script A beats script B and that script B beats script C, then he will tend to decide that script A beats script C. We examined that consistency through the use of the consistency ratio, which is a well-established metric devised by US Professor Thomas Saaty (http://people.revoledu.com/kardi/tutorial/AHP/Consistency.htm).
In order to compute the consistency ratio, one first builds a comparison matrix which is the matrix of all possible pairs of comparisons for some selected scripts. Then, one computes the largest eigenvalue (which is a characteristic number) of that matrix. One then subtracts the number of scripts from that eigenvalue in order to get a number N. One then repeats this process for a comparison matrix constituted of random judgements and gets a number R. The consistency ratio is then the ratio of number N to number R. More details of that calculation are included in the appendix.
The table below lists the values of the consistency ratios for each judge in our trial and for different samples of the scripts.
According to Saaty’s theory, a consistency ratio less than 10% means that the judge is consistent. In our study at least one of the judges had very little experience as a marker.
Furthermore, the presence of a judge making random judgements in ACJ can also be detected through the use of misfit statistics. Misfits is a statistical tool that allows one to detect judges or scripts that perform differently from their peers in a single ACJ or CJ session. The misfit metric is an average of information-weighted squared residuals (the residual being the difference between the observed result of the judgement and its predicted probability). A misfitting judge would then be defined as being more than 2 standard deviations above from the mean, according to the metric, of all judges (below the mean is not misfitting, it is actually very good as there is a smaller difference between the actual result and the prediction). And a loosely misfitting one would be defined to lie between 1 and 2 standard deviations from that mean. For instance, in our trial with 6 judges, the less experienced judge did show up in the misfit statistics as shown below. She was the only one loosely misfitting. Her judgements were not bad, it’s just that they were not as good as her peers.
We also ran a simulation where we combined the real judgements of the trial with completely random judgements made by an additional fictitious judge. From the misfits plot below, it is very clear that the presence of such a judge can be easily detected.
(We note that the misfit metric is a relative scale and not an absolute one and thus this is why the absolute values for the real judges decrease when we introduce the additional fictitious judge. )
Thus it is clear that even in the unlikely event of a judge purposely or otherwise making random responses to comparisons, the system would pick this behaviour up very clearly through the analysis of the misfit stats.
Where this was of particular concern, the issue could also be addressed by introducing before the ACJ session a consistency test for each judge and then running during the ACJ session a misfits stats analysis.
Finally, with regards to the issue of the internal consistency measure (i.e. reliability) not corresponding to the squared value of the correlation between the obtained parameters and the true generating ones. We performed various simulations on that subject. In these simulations, the true parameter values of the generating scripts were known beforehand and could be then compared to the parameters estimated by ACJ.
First of all, the alpha coefficient should not be described as a value for the reliability of the parameters obtained, but as a measure of the internal consistency of the complete set of judgements – the Judgement Consistency Coefficient. The measure applies to the scaled rank produced by this particular community of judges with this particular set of scripts. The consistency measure is an indication of our ability to repeat the exercise and achieve the same scaled rank, knowing that if we were marking and we mixed the judges and scripts up we would be more likely to get a different set of marks.
We know that theoretically, according to Classical Test Theory, the judgement consistency measure should correspond to the squared Pearson correlation between the true parameters and the estimated ones. We didn’t observe this relationship and so this could be a subject of further investigation. However, our simulations show high values of (Pearson) correlation between the true parameters and the estimated ones. They also show high values of Spearman correlation between the true ranking of the scripts and the estimated ranking as shown on the Figure below. We plotted the Spearman correlation as a function of the true standard deviation of the generating scripts. This figure was obtained after 13 rounds of judgement (or an average of 13 judgements per script)
The Spearman correlation (which is a correlation between rankings and not values) is what matters most because we want to know the similarity between the estimated ranking of the scripts and the true one. It is less important that the obtained parameters values match closely the true ones since these parameters values get scaled again on a new scale from 0 to 100. Scaled ranking of the students is what matters most. A correlation coefficient gets squared when you want to get a coefficient of determination to see if the data points fit a linear model but that is not the objective sought here. We desire to establish a ranking as similar as possible to the true one without caring too much about the actual parameter values.
Furthermore, we observe a nonlinear stochastic relationship of the Spearman correlation between the estimated and the true ranking (i.e. the validity) and a number of measured indicators in an ACJ run. We’ve plotted in the figures below the Spearman correlation coefficient as a function of the Judgement consistency coefficient (left figure) and the observed standard deviation of the parameter values (right figure).
Because both the judgement consistency coefficient and the standard deviation are related non-linearly to the Spearman correlation coefficient, they can be used as stochastic predictors in a predictive analytics exercise. We have used these two predictors as well as many others in a machine learning algorithm and we achieved the predicted validity as shown on the following figure. Its accuracy is fairly good when the predicted validity is above 0.94 with a mean error of 0.01 and a standard deviation of 0.01. However, when the predicted validity is less than 0.94, it is more error prone with a mean error of 0.02 and and a standard deviation of 0.02.
Summary of findings:
In summary, our study used full CJ with 20 Design and Technology scripts that were previously marked in an ACJ session. With 6 judges we asked each one to perform a session of full CJ. We then used the decisions from all the judges to produce an overall ranking that we used as our overall result.
The results show that:
The ranking produced by ACJ is strongly correlated with the “Gold standard”.
Any inconsistency in the judging is picked up via the real-time analysis of the misfit statistics of the judgements, and that if this is of particular concern for the assessment context in question then, as can also be applied to marking based contexts, executing a pre-ACJ self-consistency test for each judge will identify any problematic judges ahead of the assessment. In this pilot and as expected, our judges were very consistent with themselves.
Furthermore, simulations show that that the correlation between estimated rankings and the real rank positions achieved through ACJ was very high.
The consistency index was invented by US Professor Thomas Saaty. You calculate it in the following way.
First you build a matrix of all possible pairs of comparisons between your scripts (this matrix is obviously specific to each judge).
For example for a set of 3 scripts (A,B,C), you can have the following matrix
where 2 means the script in the row beats the script in the column (e.g. in that example A beats B or B beats C)
0.5=1/2 means that the script in the row loses to the script in the column (in our case, A loses to C or B loses to A)
1 means the script in the raw draws with the script in the column (which is the case
only for the diagonal elements of the matrix when the script is paired with itself)
So if the entry [A,B] = 2 then automatically the entry [B,A] = 0.5 = 1/2
and the all the diagonal entries ([A,A] , [B,B] , [C,C] ) have to be equal to 1.
Furthermore your comparison matrix is squared shaped. For N scripts, the comparison
matrix has dimensions N by N (i.e. N rows by N columns)
Once you’ve built your comparison matrix, you calculate its largest real eigenvalue
An eigenvalue is a value proper to a matrix which you can obtain easily using a function built in most IT languages.
Let’s call this largest real eigenvalue Emax.
Then your consistency index is
CI = (Emax – N) / (N – 1)
The consistency ratio (CR) is then the ratio
CR = Consistency Index (CI) / Random Index(RI)
The Random index is an average consistency index for matrices that are built with pure random judgements. You have one specific RI value per size N of matrix.
According to Saati, if the Consistency Ratio is less than 10% (0.1), then the judge’s level of inconsistency is acceptable and he is deemed to be consistent with himself. Otherwise he is deemed to be inconsistent.
By Dr Ardavan Alamir, Digital Assess Data Scientist
Dr Harjit Sekhon, MA, MBA, PhD is a Reader, having spent more than a decade holding various leadership positions within the University. These roles have ranged from leading setting up and leading a department to holding a cross-University role.
Since entering academia, after holding various marketing research and marketing posts in the UK financial services sector, he continues to take a keen interest in developments in financial services. His particular areas of interest cover areas such as trust, fairness, professionalism, and excellence in financial services. In a wider services sense he has a keen interest in modelling productivity.
While holding various leadership positions he has published numerous articles on the topic of services marketing/management. These papers have appeared in various conferences and marketing journals of national and international repute as Journal of Marketing Management, Journal of Services Marketing, Journal of Strategic Marketing, and the European Journal of Marketing. Previously he was the Associate Editor for the International Journal of Bank Marketing.