NLP to Improve Accuracy and Quality of Dictated Medical Documents (Massachusetts)

Project Final Report (PDF, 389.03 KB) Disclaimer

This project does not have any related annual summary.

NLP to Improve Accuracy and Quality of Dictated Medical Documents - Final Report

Zhou L. NLP to Improve Accuracy and Quality of Dictated Medical Documents - Final Report. (Prepared by Brigham and Women's Hospital under Grant No. R01 HS024264). Rockville, MD: Agency for Healthcare Research and Quality, 2019. (PDF, 389.03 KB)

The findings and conclusions in this document are those of the author(s), who are responsible for its content, and do not necessarily represent the views of AHRQ. No statement in this report should be construed as an official position of AHRQ or of the U.S. Department of Health and Human Services.
Principal Investigator: 
Document Type: 

A National Web Conference on Improving Health IT Safety through the Use Natural Language Processing to Improve Accuracy of EHR Documentation

Event Details

  • Date: February 7, 2017
  • Time: 2:00pm to 3:30pm
This Web Conference discussed the development of innovative tools and methods designed to advance health IT safety through improved EHR documentation.
This project does not have any related resource.
This project does not have any related survey.
This project does not have any related project spotlight.
This project does not have any related survey.
This project does not have any related story.
This project does not have any related emerging lesson.
The use of natural language processing shows promise for automatically detecting errors in electronic patient notes created with speech recognition, with the potential of improving the accuracy, completeness, legibility, and accessibility of medical documents to enhance patient safety and health care delivery.

Project Details - Ended


In addition to typing, dictating, and use of template-based documentation, speech recognition (SR) software integrated into electronic health records allows users to create patient notes to document patient care. While easy to use and efficient, SR is prone to errors, including spelling errors and “real-word” errors, where a correctly spelled word is incorrect in the context of the note. Spell check functionality catches spelling errors, but real-word errors are more difficult to automatically detect and correct. As such, clinicians must proofread and edit SR-generated notes, a step that may be skipped due to time constraints. Errors that are missed become part of the permanent medical record, making those documents inaccurate and potentially impacting future patient care and safety. This research utilized natural language processing (NLP) to improve accuracy of SR notes by automatically detecting and identifying potential errors. Conducted at two large integrated healthcare systems, Partners HealthCare in Boston Massachusetts and the University of Colorado Health in Aurora Colorado, the research also surveyed physicians around their perceptions of SR errors and the value of SR in creating patient notes. SR errors in documents were analyzed, allowing the researchers to develop guidelines for the identification and classification of the errors.

The specific aims were as follows:

  • Build a large corpus of clinical documents dictated via SR across different healthcare institutions and clinical settings. 
  • Conduct error analysis to estimate the prevalence and severity of SR errors. 
  • Develop automated, robust methods to detect SR errors in medical documents. 
  • Evaluate the performance of the proposed methods and tool. 
  • Distribute our methods and tools. 

An annotation schema was developed that included 12 general error types such as insertion or deletion; 14 semantic types, such as medication and general English; and clinical significance as being either direct, ones that could influence clinical decision making, or indirect, such as ones that could result in billing errors. In evaluating SR notes, an error rate of 7.4 percent was observed in pre-edited notes; this dropped to 0.4 percent after editing from a professional transcriptionist, and further dropped to 0.3 percent with the dictating physician’s review. Errors noted under the schema found that deletions were the most prevalent general error type, English was the most frequent semantic type, medication was the most common clinical semantic type in original SR transcriptions, and diagnosis was the most common in the transcriptionist-edited, clinician-reviewed versions.

Several error-detection models were developed utilizing NLP and tested for accuracy using F1 scores. F1 score is a measure of a test’s accuracy, with 100 percent being perfect accuracy. A model based on a statistical language achieved an F1 score of 81 percent, a recurrent neural network-based model had an F1 score of 77 percent, and a topic model-based classifier had an F1 score of 24 percent.

All participants interviewed agreed that SR increases efficiency and accuracy of documentation. User estimates of SR errors ranged widely, with a low of 1 percent and a high of more than 50 percent. Estimated time spent editing and correcting errors was between 1 and 3 minutes per patient. The researchers concluded that using NLP for error detection in SR-generated patient notes is promising, but research needs to continue to further refine these models.