An emerging artificial intelligence tool used by family doctors to take notes during an examination is at risk of providing physicians with inaccurate information and hallucinations, Ontario’s auditor general has found, raising questions about the ongoing use of the software in the health-care system.
In 2023, Ontario Health began introducing AI Scribe programs to the broader health-care sector, allowing physicians, family doctors, nurse practitioners and therapists to adopt the technology.
Once a patient has authorized its use, AI Scribe listens to medical examinations and compiles a “SOAP” note that intertwines subjective and objective information provided by the patient and physician. The resulting information is compiled into four areas — subjective, objective, assessment and treatment plan — making up the acronym SOAP.
While the tool was intended to relieve the note-taking pressures health-care workers constantly faced, the auditor general found the systems could be inaccurate and unreliable.
In a special report released Tuesday, Ontario’s auditor general Shelley Spence found that AI Scribe systems were “not evaluated adequately” and sometimes “fabricated information” and offered treatment plans never discussed by the doctor.
“Inaccuracies in medical notes generated by AI Scribe systems could potentially result in inadequate or harmful treatment plans that may potentially impact patient health outcomes,” the auditor’s report said.
“It is important that the AI Scribe systems are tested to provide assurances as to the quality of their generated notes and to minimize inaccuracies.”
Minister of Public and Business Service Delivery and Procurement Stephen Crawford insisted the hallucinations had taken place in testing and training — not during medical appointments.
Get breaking National news
“That’s essentially when we’re undergoing the training mode to see whether we’re going to use the scribe or not,” he told reporters. “Let’s be very clear about that, that’s not actually in operational use with doctors, that’s in the optional stage where we’re reviewing the various scribes.”
He said the issues were “in the testing mode and modifications were then done” to the artificial intelligence system.
Despite Crawford’s pledge that the real-world system doesn’t fabricate information, Spence said she recently felt compelled to tell her doctor to double-check the notes provided by an AI scribe.
“I actually went to my doctor — because you can hear that my voice isn’t exactly what it normally is — the other day and they were using a scribe,” she said. “I kind of mentioned, ‘Please look at the transcript when you’re done with my own visit.’ But it is in use.
In a statement, the auditor general’s office said it had not seen any evidence the government had tested the systems after purchasing them.
“All AI scribe systems from the 20 approved vendors showed one or more inaccuracies at the procurement testing phase—hallucinations (fabrication), incorrect information, or missing or incomplete information—and there was no evidence of additional testing or evaluation to reduce the risk of fabrication at the time of the audit,” the statement read.
“This was Supply Ontario’s own testing, not our own.”
As part of its review, the auditor’s office looked at the bidding process used by Supply Ontario to procure AI Scribe systems.
- Road to the Referendum: Exploring Alberta separatism and the province’s place in Canada
- Poilievre calls for emergency debate as Canada enters technical recession
- Kerry-Lynne Findlay voted new leader of the BC Conservative Party
- Leadership vote to be announced for B.C. Conservative Party after Rustad ouster
Potential vendors were given two simulated doctor-patient interactions to transcribe. The results were given to Ontario Health and its digital arm, Ontario MD, to assess.
The review found:
- 45 per cent AI Scribe systems fabricated information and made suggestions to patients’ treatment plans, such as referring the patient for therapy or ordering blood tests, even though these steps were not mentioned in the simulated recordings
- 60 per cent of the AI Scribe systems captured a different drug than what was prescribed by the doctor
- 85 per cent of AI Scribe systems missed key details about the patients’ mental health issues in at least one of the two tests, even though this was mentioned in the simulated recordings
Spence said the tool was a problem, but pointed out the systems are always improving.
“I believe it is problematic, but AI is a tool that will improve efficiencies and delivering services,” Spence said. “It is going to take some baby steps to get there, to get it to be perfectly great.”
She wouldn’t say if she thought the government was moving too fast — but the auditor said she wanted to see more safeguards in place.
The auditor’s report found Supply Ontario did not require AI Scribe vendors to demonstrate their systems live or operate them in front of the evaluators and did not conduct a comprehensive evaluation of whether the systems included a risk mitigation strategy.
The auditor also revealed that at least five of the 20 vendors did not submit risk and privacy impact assessment reports, as required by the bidding process, but were approved regardless.
Supply Ontario agreed to implement the majority of the auditor’s recommendations, including requiring bias testing to be conducted and submitted before an AI contract is awarded and considering live demonstrations of AI products before adoption.
The auditor general said artificial intelligence had helped with editing and “supporting” her reports — but that they were “by no means written by AI.”
This is like finding out smoking is bad, why are boomers allowed to run the country?
Yeah, after discussing my thinning hair with my doctor, SOAP advised rubbing peanut butter into my scalp every night at bedtime…the dog really loves sleeping with me now…
Well that’s a terrifying headline!