AI Transcription Hallucinations
A recent study has revealed that AI transcription tools can generate harmful text due to hallucinations. OpenAI‘s Whisper API was found to hallucinate 1.4% of transcriptions, with 38% of those containing harmful content. These errors were even more frequent when transcribing speech from individuals with aphasia.
Although AI transcription tools have revolutionized record keeping for doctors and minute-taking for meetings, they aren’t foolproof. A new study shows that when advanced transcription tools like OpenAI’s Whisper make mistakes, they don’t just produce jumbled text—they invent entire phrases, which can be troubling.
We know that all AI models hallucinate. For example, if ChatGPT doesn’t know the answer to a question, it might fabricate a response rather than admitting uncertainty. Researchers from Cornell, the University of Washington, NYU, and the University of Virginia found that although Whisper is more accurate than other tools, it still hallucinates about 1% of the time.
A critical finding of the study is that “38% of hallucinations include explicit harms such as perpetuating violence, creating inaccurate associations, or implying false authority.”
AI Transcription Hallucinations
Whisper appears uncomfortable with pauses in speech, often hallucinating to fill these gaps. This becomes a significant issue when transcribing speech from people with aphasia, a condition that makes it challenging for individuals to find the right words.
‘Careless Whisper’: The Dark Side of AI Transcription Hallucinations Errors
The research team experimented with early 2023 versions of Whisper. Although OpenAI has improved the tool since then, Whisper’s tendency to invent troubling text remains noteworthy.
+—————+
| AI Brain |
| (Whisper API) |
+—————+
|
|
v
+—————+
| Transcription |
| (mostly accurate)|
+—————+
|
|
v
+—————+
| Hallucinations |
| (1.4% of the time) |
| (38% harmful content) |
+—————+
|
|
v
+—————+
| Troubling Text |
| (perpetuating violence, |
| inaccurate associations, |
| false authority) |
+—————+
Researchers categorized these harmful hallucinations as follows:
- Perpetuation of Violence: Imaginary scenarios involving violence, sexual innuendo, or demographic stereotyping.
- Inaccurate Associations: Fabricated information like incorrect names, false relationships, or erroneous health conditions.
- False Authority: Impersonating authoritative figures or media, often including directives that could lead to phishing attacks or other deception.
The study provided examples where Whisper added alarming terms like “blood-soaked stroller” to a description of a fireman rescuing a cat or inserted “terror knife” into a sentence about someone opening an umbrella.
OpenAI has since reduced the frequency of problematic hallucinations in newer versions of Whisper but hasn’t explained why the tool behaved this way initially.
Even a small number of hallucinations can have severe consequences. For instance, if Whisper is used to transcribe video interviews for job applicants and invents terms like “terror knife,” “blood-soaked stroller,” or “fondled,” it could unfairly affect an applicant’s chances.
The researchers urged OpenAI to inform users about Whisper’s hallucinations and understand why these errors occur. They also recommended designing future versions of Whisper to better serve underserved communities, such as those with aphasia and other speech impediments.
To stay updated on the latest developments in AI, visit aibusinessbrains.com.