Deepmind safe AI

AIBUSINESSBRAINS

DeepMind SAFE AI: Making ChatGPT More Reliable with Smart Fact-Checks

DeepMind SAFE AI: Making Sure AI Tells the Truth

Is it possible to trust every word from AI? Scientists from DeepMind and Stanford University say yes, with their latest breakthrough, SAFE! Announced on April 2, 2024, SAFE that checks facts in what large language models (LLMs) like ChatGPT say. This tool also helps compare how true the information from different AI models is. It works by breaking down AI’s answers, googling each fact to see if it’s correct. This could be the end of AI making mistakes in long answers.

Why does this matter? AIs can get facts wrong, especially in longer responses. Even the best AI models can sometimes get things wrong. For example, if you ask ChatGPT for information on a topic, the longer it talks, the more likely it is to say something that’s not true.

It was hard to know which AI models were the most accurate, especially for long answers, because we didn’t have a way to measure this. But then, DeepMind used GPT-4 to make something called LongFact. This is a bunch of questions about 38 different topics. These questions make the AI give long answers.

Next, they made an AI tool using GPT-3.5-turbo that checks if these long answers are true by looking them up on Google. They named this tool SAFE, which stands for Search-Augmented Factuality Evaluator.

SAFE works by breaking down the long answers into smaller pieces, then looking each piece up on Google to see if it’s true.

The people who made SAFE say it’s really good at checking facts, even better than people. It agrees with what people think 72% of the time. And when it doesn’t agree, it’s usually right 76% of the time. Plus, it’s a lot cheaper than having people check the facts.

They looked at how many true facts were in the AI’s answers to measure its quality. They used a special way to measure this called F1@K. This just means they checked if the AI gave the right amount of true facts that people wanted.

Then, they tested 13 different AI models to see which one was the best at giving true long answers. GPT-4-Turbo was the best, followed by others. They found that bigger AI models were usually more accurate.

DeepMind SAFE AI is a cheap and fast way to see how true AI’s long answers are. It’s better than using people for this, but it still relies on the accuracy of Google’s search results.

DeepMind SAFE AI has been made available for everyone to use. They think it can make AI like ChatGPT more accurate by helping it learn better. It might even let AI check its facts before giving an answer.

OpenAI will be pleased to know that DeepMind’s research shows GPT-4 is more accurate than another model called Gemini in this test.

Leave a Comment