AI’s Looming Disruption of the Information as a Service Industry – and How to Fight Back
AI’s Looming Disruption of the Information as a Service Industry – and How to Fight Back
12 Steps to Create Videos

‘Humanity’s Last Exam’ benchmark is stumping top AI models – can you do any better? [Video]

Categories
Integrated Solutions Offering
PM Images/Getty Images

Are artificial intelligence (AI) models really surpassing human ability? Or are current tests just too easy for them? 

On Thursday, Scale AI and the Center for AI Safety (CAIS) released Humanity’s Last Exam (HLE), a new academic benchmark aiming to “test the limits of AI knowledge at the frontiers of human expertise,” Scale AI said in a release. The test consists of 3,000 text and multi-modal questions on more than 100 subjects like math, science, and humanities, submitted by experts in a variety of fields. 

Also: Roll over, Darwin: How Google DeepMind’s ‘mind evolution’ could enhance AI thinking

Anthropic’s Michael Gerstenhaber, head of API technologies, noted to Bloomberg last fall that AI models frequently outpace benchmarks (part of why the Chatbot Arena leaderboard changes so rapidly when new models are released). For example, many LLMs now score over 90% on multi-task language understanding (MMLU), a commonly used benchmark. This is known as

How to Supercharge your Digital Marketing with Desire Paths
How to Supercharge your Digital Marketing with Desire Paths
5 Steps to Creating Successful Ads