Enabling Large-Scale Research on Autism Spectrum Disorders Through Automated Processing of EHR Using Natural Language Understanding (Arizona)

Project Final Report (PDF, 2.02 MB) Disclaimer

This project does not have any related annual summary.

Enabling Large-Scale Research on Autism Spectrum Disorders Through Automated Processing of EHR Using Natural Language Understanding - Final Report

Citation:
Leroy G. Enabling Large-Scale Research on Autism Spectrum Disorders Through Automated Processing of EHR Using Natural Language Understanding - Final Report. (Prepared by the University Of Arizona under Grant No. R21 HS024988). Rockville, MD: Agency for Healthcare Research and Quality, 2020. (PDF, 2.02 MB)

The findings and conclusions in this document are those of the author(s), who are responsible for its content, and do not necessarily represent the views of AHRQ. No statement in this report should be construed as an official position of AHRQ or of the U.S. Department of Health and Human Services. (Persons using assistive technology may not be able to fully access information in this report. For assistance, please contact Corey Mackison).
Principal Investigator: 
Document Type: 
Research Method: 
Population: 
Medical Condition: 
This project does not have any related event.
This project does not have any related resource.
This project does not have any related survey.
This project does not have any related project spotlight.
This project does not have any related survey.
This project does not have any related story.
This project does not have any related emerging lesson.
Applying algorithms on free text in electronic health records can identify criteria for autism spectrum disorder (ASD), which improves earlier detection and treatment as well as research with large-scale data.

Project Details - Ended

Summary:

The prevalence of autism spectrum disorders (ASD) has increased dramatically in the last 2 decades. While this increase is not well understood, hypotheses range from changing diagnostic criteria to environmental factors. With new research focusing on neural, genetic, and environmental causes, there is a need to extract new types of data from patient records. Much of this data, when it does exist, is contained in free-text notes and is not readily available for research unless manually extracted. Natural language processing (NLP) can transform unstructured information into computable discrete data elements. NLP algorithms designed specifically for the ASD population can make data analysis and integration with other sources possible.

This research study developed and evaluated NLP algorithms to identify ASD behaviors within free text in an EHR, labeling them with the Diagnostic and Statistical Manual of Mental Disorders (DSM) diagnostic criteria for ASD. In addition, machine learning (ML) algorithms were used to label a child’s clinical record as either ASD or not. Finally, the researchers developed a prototype user interface that highlights clinicians free-text sentences containing ASD DSM criteria.

The specific aims of the research were as follows:

  • Design NLP algorithms to create human-interpretable models that automatically annotate free text in electronic health records and match to criteria in the DSM for ASD. 
  • Demonstrate the feasibility and usefulness of the models for new research projects. 

Data from the Centers for Disease Control and Prevention’s Autism and Developmental Disabilities Monitoring Network (ADDM) were used and matched against data from any of the four existing clinical sources. The ADDM monitors ASD in 4- to 8-year-olds. Records were manually annotated by experts who marked sentences containing DSM criteria. The NLP and ML algorithms were then applied to the records. Both precision and recall were measured. In this context, precision was the correct labeling of phenotypical expression of ASD behavior with the correct DSM diagnostic criterion. Recall was the ability of the system to identify the sentences that the experts had annotated. At the annotation level, precision was 74 percent, while recall was 42 percent. At the sentence level, average precision was 76 percent, with average recall being 43 percent.

The study addressed a gap in electronic health record use in mental health, where behaviors that meet DSM criteria are frequently buried in free text. Given that children with ASD demonstrate drastically variable behaviors that qualify for the same DSM criteria, diagnosing these children is complex and may be delayed. The algorithms can be integrated in a user-friendly interface, which can facilitate diagnosing of children by clinicians with limited expertise. This work has the potential to improve earlier diagnosis and treatment of children with ASD and enhance research efforts for ASD.