AI Benchmark Leaderboards
What are AI Benchmarks?
- Like an exam for AI systems.
- Designed to assess a specific skill in a standardised way, resulting in a score that allows for comparison between systems.
- Consist of a problem specification, a dataset, and a defined score. Correct answers are often referred to as the ground truth.
- AI benchmarks test the quality of the AI output of EdTech products - one part of a wider Quality Assurance Framework.
Why are AI benchmarks useful?
- AI benchmarks provide a target - for AI model developers and EdTech product developers - to measure against and help them to understand weaknesses, and focus improvements.
- Users and policymakers can see performance scores, enabling choice in which AI systems to use, and boosting confidence in the outputs they receive.
What are the main challenges in developing ai benchmarks in education?
- Sourcing resources for the dataset, particularly from low- and middle-income country (LMIC) contexts, such as existing human exam questions, learning resources or student working.
- Defining the scoring (i.e. what does 'good' look like?) when facing open-ended, subjective aspects of education.
What AI benchmarks have we developed so far?
- The Pedagogy Benchmark - AI models do well at student exams, but do they know about pedagogy and helping students learn? We made The Pedagogy Benchmark to see if models can pass teacher exams.
- The SEND Pedagogy Benchmark - An extension using a set of questions related to Special Educational Needs and Disabilities (SEND) specific pedagogy.
- The Visual Maths Benchmark - AI models can answer complex mathematics tests, but how well do they perform with visual maths, key for learning in early grades? Here we test exactly that.
We need your help!
We use these benchmarks to make the case for kids in LMICs - we want to AI model developers to know where they can improve their models for LMIC contexts. And the best way to do this is to use real world examples. Do you know any relevant information sources that can help? E.g. examples from LMICs of student work, early grade maths textbooks or compilations of common misconceptions. If so, please get in touch with alasdair.mackintosh@fabinc.co.uk