Speech Recognition Sucks and It Costs More, Too!

Three articles in the December issue of the Journal of Digital Imaging discuss speech recognition, and only one is complementary. Let’s look at that one first.

Mika P. Koivikko, Tomi Kauppinen, and Juhani Ahovuo, all from the HUS Helsinki Medical Imaging Center of the Helsinki University Central Hospital, measured the effect of SR on report turnaround time (RTT). They found an 81% reduction in RTT, and found that “SR was easily adopted and well accepted by radiologists. . .with excellent end-user satisfaction.” The Helsinki team compared their old cassette-based transcription to SR using Philips SpeechMagic. They had some additional hurdles:

The Finnish language is challenging for SR because its vocabulary is exceptionally wide allowing many different words to evolve from one word body. HUS Helsinki Medical Imaging Center has actively participated in the development of a Finnish SR context for radiology.

The did improve RTT, as noted, although not uniformly:

During this study, the proportion of reports available within 1 h has rapidly risen from 26% (cassette Q1/2005) to 58% (SR Q1/2006). For nonurgent studies, such as most of our MR imaging procedures, the mean RTT still remained high (Table 2). In contrast, for typical high-priority worklists requiring online reporting (i.e., ICU or orthopedics), we measured an exceptional 53% reduction in RTTs and an increase from 34% to 65% in first-hour reporting. Thus, for our hospital, the increased number of reports available within 1 h from the completion of a study has proven a great improvement.

One has to wonder how much of the improvement came from eliminating the cassettes and going to a digital tank. The authors did appreciate the online editing inherant in SR.

The next article was not quite so kind to SR. Kimberly Voll, Ph. D., Stella Atkins, Ph. D., and Bruce Forster, M. D., from British Columbia, have more realistic observations. They see the potential, but realize the current inadequacies of SR:

The recent improvements of speech recognition (SR) technology have motivated the introduction of automated transcription software in lieu of human transcription. Speech recognition can offer improved patient care and resource management in the form of reduced report turnaround times, reduced staffing needs, and the efficient completion and distribution of reports. As the technology comes of age, however, with vendors claiming accuracy rates as high as 99%, the potential advantages of SR over traditional dictation methods are not being realized, leaving many radiologists frustrated with the technology.

The primary reason behind this apparent failure is accuracy. A 99%-accurate speech recognizer still averages one error out of every hundred words, with no guarantees as to the seriousness of such errors. Furthermore, actual accuracy rates in the reading room often fall short of 99%. Radiologists are instead forced to maintain their transcriptionists as correctionists, or to double as copy editors, painstakingly correcting each case, often for nonsensical or inconspicuous errors. Not only is this frustrating, but it is a poor use of time and resources. To compound matters, problems integrating with the radiology suite and the introduction of delays have further soured many radiologists on the technology. Those choosing to modernize their reading rooms with SR software are often plagued with difficulties, whereas those continuing to use traditional reporting methods have mixed incentives with respect to upgrading. Nonetheless, the potential benefits to radiology reporting from a hospital administration standpoint continue to motivate the adoption of SR technology. Thus, improving SR dictation is of particular importance.

Their solution:

As a partial solution to this problem, we have proposed a post-speech-recognition, statistical error-detection system for radiology. A previously unexplored area of research, this technique shows promise as an effective means to recover from the unacceptable accuracy rates of SR. By flagging potential errors, we can enhance the proofreading process, restoring the benefits of SR in resources saved. The result is a more efficient reading room and an improved experience with SR.

This solution dumps less on the radiologist, helping us in our new-found SR editing duties.

The piece-de-resistance, as far as I’m concerned, is found in the third article, by Pezzulo, et. al., from Brown University. The abstract says it all:

Continuous voice recognition dictation systems for radiology reporting provide a viable alternative to conventional transcription services with the promise of shorter report turnaround times and increased cost savings. While these benefits may be realized in academic institutions, it is unclear how voice recognition dictation impacts the private practice radiologist who is now faced with the additional task of transcription. In this article, we compare conventional transcription services with a commercially available voice recognition system with the following results: 1) Reports dictated with voice recognition took 50% longer to dictate despite being 24% shorter than those conventionally transcribed, 2) There were 5.1 errors per case, and 90% of all voice recognition dictations contained errors prior to report signoff while 10% of transcribed reports contained errors. 3). After signoff, 35% of VR reports still had errors. Additionally, cost savings using voice recognition systems in non academic settings may not be realized. Based on average radiologist and transcription salaries, the additional time spent dictating with voice recognition
costs an additional $6.10 per case or $76,250.00 yearly. The opportunity costs may be higher. Informally surveyed, all radiologists expressed dissatisfaction with voice recognition with feelings of frustration, and increased fatigue. In summary, in non-academic settings, utilizing radiologists as transcriptionists results in more error ridden radiology reports and increased costs compared with conventional transcription services.

Read that summary one more time! SR reports are costlier and have more mistakes! So the only benefit might be from increased RTT, but with an adequate transcription pool, even that falls by the wayside. For what it’s worth, the study used Agfa TalkTechnology version 2.1.28.

The article made a very profound discovery: “Our results suggest that radiologists are not good transcriptionists.” Duh.

Once again, when you look below the surface, SR just isn’t a good idea. It may improve turnaround time, but generally only when RTT was dismal to begin with. It just doesn’t work well, and in the private-practice setting at least, it is costly and error-prone. Of course, the hospital bean-counters don’t care about the added cost of a radiologist-transcribed report because….they don’t pay the radiologist! But outside of that, SR just doesn’t make sense. Not yet, anyway.


2 responses to “Speech Recognition Sucks and It Costs More, Too!

  1. “Additionally, cost savings using voice recognition systems in NON-ACADEMIC settings may not be realized”. In other words, in academic settings where radiology residents exist, there is cost savings because the radiology residents perform the task and are not compensated for the additional burden as transcriptionist. Since noone usually defends residents, I will chime in and say that VR has been a detriment to resident education. But residents are essentially free labor in the U.S. so who cares.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s