Another Nail in Speech Recognition’s Coffin

I have not been a fan of Speech Recogniton (it is NOT VOICE RECOGNITION!!!) for a number of reasons. First, it takes my eyes off of the image, where they are paid to be, and forces me to look at the developing report. Second, it turns me into an unpaid editor, doing the transcriptionist job for free. And finally, it doesn’t work very well at all.

That last point is often disputed, but a recent article in the American Journal of Roentgenography reiterates the fact. And it is a fact. Basma, et. al., took the following approach:

We scrutinized 615 reports for errors: 308 reports generated with ASR (data from the hospital at which ASR had been used for 2 years) and 307 reports generated with conventional dictation transcription (data from the hospital that continued to rely on transcriptionists for report generation). A total of 33 speakers made the 615 reports; 11 speakers used both ASR and conventional dictation transcription. . .

The voice recognition software used was Speech Magic (version 6.1, service pack 2, Nuance). ASR reports were verified and signed by the author as they were generated. If the speaker was a fellow or resident, a staff member was responsible for reviewing the case before dictation of the report. Dictation was completed with a handheld speech microphone (ProPlus LFH5276, Philips Healthcare).

Conventional dictation transcription was undertaken using the E-RIS transcription system, version 1.44 (Merge Technology). Transcription was completed by transcriptionists experienced in breast imaging reporting. Once transcribed, reports were sent to the original speaker for electronic amendment and verification.

All reports dictated by attending radiologists or trainees were reviewed on the radiology information system at an electronic PACS workstation, corrected for errors, and verified, making these reports immediately available on the hospital clinical information system. The speaker assumed complete responsibility for report production, including correcting typographic errors generated by the voice recognition software or the transcriptionist.

And the result?

Among the 308 reports generated with ASR, 159 reports (52%) contained at least one error compared with 68 of the 307 reports (22%) generated with conventional dictation transcription (p < 0.01). Reports generated with ASR were also more likely than conventional reports to contain at least one major error (23% vs 4%, p< 0.01).

A total of 230 errors were found in 159 ASR reports. The most common error types were added word (46 instances, 20% of total ASR errors), word omission (43 instances, 19%), word substitution (39 instances, 17%), and punctuation error (49 instances, 21%). A total of 77 errors were found in 68 conventional dictation transcription reports. The most common error types were word substitution (15 instances, 19% of total conventional report errors), word omission (13 instances, 17%), added word (11 instances, 14%), and punctuation error (14 instances, 18%). . .

Our data showed that breast imaging reports generated with ASR are 8 times as likely as reports generated with conventional dictation transcription to contain major errors, after adjustment for native language, academic rank of the speaker, and breast imaging modality. Twenty-three percent of the reports generated with ASR reviewed in this study contained at least one error that could have affected understanding of the report or altered patient care.

They conclude:

Complex breast imaging reports generated with ASR were associated with higher error rates than reports generated with conventional dictation transcription. The native language and the academic rank of the speaker did not have a strong influence on error rate. Conversely, the imaging modality used, such as MRI, was found to be a predictor of major errors in final reports. Careful editing of reports generated with ASR is crucial to minimizing error rates in breast imaging reports.

This comes as no big surprise.  You might wonder why so many ASR mistakes get through.  Jay Vance, CMT, author of the AHDI Lounge Blog (Association for Healthcare Documentation Integrity, not ADHD…) comments about this on HISTALK:

“…why the radiologist didn’t catch the mistakes on the screen when using speech recognition…”

As someone intimately familiar with speech recognition editing, I can tell you the eye tends to “see” what the brain tells you SHOULD be there rather than what actually IS there. This is a well-known phenomenon among SR editors. Add to that the fact that the physicians dictating these reports using front-end SR are in a hurry to just get it over with, and it’s no surprise to see such a high error rate.

“Also keep in mind that this compared only two transcription options, with the third being back-end speech recognition…which I believe has much higher accuracy…”

You make a valid point, but the issue isn’t the comparative accuracy of front-end versus back-end SR. The comparison is between reports reviewed by a second pair of eyes versus those which are not. Whether a report is transcribed “from scratch” or edited from a SR draft, in both cases there is a skilled healthcare documentation specialist reviewing the original dictation. With front-end SR, however, it’s once-and-done, which of course is the holy grail of clinical documentation. The problem, as this study clearly shows, is that once-and-done dramatically increases the risk of medical error. Unfortunately, that risk doesn’t seem to get factored into the ROI when the front-end SR vendor is making the sales pitch.

We in the medical transcription field are doing our best to highlight the crucial risk management/clinical documentation improvement role our practitioners perform as a matter of course, a role that up to this point seems to have been taken for granted. Studies like this help prove what we’ve been saying all along: removing skilled healthcare documentation professionals from the process puts patients at risk, not to mention increasing liability for healthcare providers and jeopardizing reimbursements due to improper documentation. That’s a message we’re determined to deliver to the rest of the healthcare community as well as the public at large.

Emphasis mine.  But that says it all.  ASR has at least the potential to put patients at risk and increase liability.  ASR does increase turn-around time, but really only for those sites that don’t have adequate transcription personell. And you can see the price that one might possibly pay to save the cost of a few FTE’s.

It will not be darkening my door for the foreseeable future.


4 responses to “Another Nail in Speech Recognition’s Coffin

  1. yeah, at our hospital, we're just getting around to implementing backend SR…with the stated goal of front-end SR.As a PACS monkey, I am approaching the project with a healthy dose of what we in Maine would call "wicked skepticism".

  2. Hi, We had the speech recongition completely failing at our organization. Eventhough we have 2 different systems Speech recognition systems , one integrated and the other stand alone, still not many takers. The online speech recognition, as you said takes the radiologists eye off the image and concenterate on the screen (and worse the hand held device!) We provided radiologists can have their own templates in the area where they type reports and they are pretty happy with it. Transcriptionists are making merryi doing routine secretarial work!

  3. I have used speech recognition for 2 years. The reasons I switched are detailed exactly in this article – that mistakes were being made. Rather, the mistakes were made by humans, in fact a letter was sent (by a human transcription) – "dilated lips of small valve".With the correct software (i.e. Nuance), appropriate set-up (and this took some self education that can be skipped in the future), it is fantastic.I spend more time with patients, my staff does as well. Less time pulling, charts, proofing / printing transcriptions. This is wasted time, energy and resources.Transcription is also expensive, we cannot afford the current way we do business.Change is hard, easy to criticize, and medicine needs to wake up. If we don't telemedicine and technology such as voice recognition will show us to be too slow to react and further alienate patients and policy makers.As a surgeon that watched my profession rocked by laparoscopy / video and complaints and criticism, please let's learn from the mistakes of others and embrace change. Laparoscopy is the standard of care – would you want your first option to be an open cholecystectomy?It is different, but not bad. It takes time and without a doubt adds value.I personally beg others to look at it and embrace change.

  4. I think that what is said mirrors my experience: Self editing simply doesn't work. Those forcing it on Radiolgists don't care about 1) radiologist time 2) accuracy. They do care about their own bottom line to the exclusion of all else.We use it, but we use it as a dictation system, with transcriptionists reviewing everything. In that mode, I find it saves a lot of time, because I have several hundred predefined reports and pieces of reports that I can put in. (without taking my eyes off the screen). A real positive if some of your referrings are of the school of "I want a lot of words for my money"

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s