Scientists create the hardest AI test ever — and the results are surprising.
humanity's last exam, consisting of 2,500 questions, shows that ai still has a large gap to bridge compared to human intelligence.
humanity's last exam, consisting of 2,500 questions, shows that ai still has a large gap to bridge compared to human intelligence.
performance benchmarks aren't useless. the problem is they're serving the wrong audience, acting more like marketing than clearly explaining what's new, what works, and how it will
gemini 3 officially launched with stronger inference capabilities, leading many ai benchmarks and surpassing gpt-5.1 in key assessments. deep think mode is also greatly improved