How’s the AI Index Report 2024: AI Getting Better Than Humans? Need New Tests?
April 17, 2024
Stanford University recently published a report about AI, showing how it’s improving faster than ever. The report tells us that the old ways we measured AI against humans aren’t working anymore.
Every year, Stanford releases a report that tells us what’s happening with AI. This year, it says that AI is getting so much better that the tests we use to measure it don’t make sense anymore.
Before, we used tests that compared AI to how good humans are at tasks. But now, AI is doing better than humans on these tests. For example, there’s a test called MMLU that asks multiple-choice questions about different subjects like math and history. Back in 2019, the best AI model scored around 30%, while humans scored around 90%. But now, a new AI model called Gemini Ultra scored even better than humans.
The report shows that AI models are now routinely doing better than humans on these tests. So, we need to come up with new, tougher tests to measure AI’s abilities.
There’s a new test called GPQA that’s really tough. It asks 400 graduate-level questions, and even smart people with PhDs struggle to answer them. This shows us how much AI has improved.
The report also talks about how we measure AI’s safety. Right now, it’s hard to say which AI model is safer than another because we don’t have good tests for that. Also, AI developers don’t always tell us how they trained their models, which makes it hard to trust them.
Instead of just using tests, some people are asking other humans what they think of AI. For example, they might ask people to rank how good an AI chatbot is at talking to them. This could be a better way to measure AI in the future.
Overall, the report suggests that AI is getting so good that soon, it might be even smarter than us. And figuring out which AI to use might come down to how we feel about them, rather than just test scores.