The mainstay of most GCSE exams is the short-text question. For our new GCSE Practice Exams, we have developed a free-text marking engine that is robust enough to deal with real GCSE questions and, more importantly, real students. That system is now (March 2018) in beta, and performing well above expectations.
Short-text automarking has traditionally been regarded as nigh-on impossible by the assessment community. Short-text occupies that difficult transition zone between ‘objective’ (e.g. multiple choice) methods that are relatively straightforward to mark, and long-text essay type questions that can be marked with reasonable accuracy using syntactical rules and semantic analysis. Whilst several teams have tried to build a production-capable short-text automarker, we appear to be the first to have succeeded.
We came to the project with two advantages other teams did not have. Firstly, ten years experience of real-world automated assessment. Secondly, a huge community of willing volunteers to test our prototypes and provide feedback. Here, in simplified form, are the problems we faced, and the solutions we found to them:
Student spelling ranges from dictionary-perfect to random-character-generator. Experienced teachers become expert at extracting the intended meaning from even the most tortured English.
Yacapaca does the same. We took the most brutally efficient autospell available and enhanced it with multiple specialist dictionaries covering subject vocabulary, current student-speak, stop words and more.
A human moderator, fundamentally, judges whether the student understands the subject of the question. Yacapaca uses artificial intelligence to emulate this process. Only emulate, mind you. Artificial Intelligence does not really work remotely like human intelligence.
There are thousands of possible ways to express any given concept. There are thousands more possible ways to say something superficially-similar that actually indicates a complete lack of understanding. To build a scoring mechanism that can finish between the two, we had to devise a unique scripting language.
The first lesson from our beta-testers was that working across multiple schools required a much wider vocabulary than working within a single school. Even after a question is published, we follow a strict schedule of manually reviewing and updating the scoring rubrics.
Of course students need to understand how to improve. After each quiz, we show them the scoring rubrics so they are able to improve. Because of the large number of questions and the randomised approach, a student can expect to only occasionally meet the same question twice. Knowing that this is a possibility is one of the central motivations to pay attention to feedback.
Setting up and managing automarked short-text questions requires significant training and resources. For those reasons they will remain limited to our professionally-produced premium content. While that is not free, it works out a very great deal cheaper than paying a suitably-qualified person to do the marking. And if that person happens to be you, it frees you up to do more important tasks.