Enabling Large-Scale Research on Autism Spectrum Disorders Through Automated Processing of EHR Using Natural Language Understanding (Arizona)

Project Details - Ongoing

Project Categories


The prevalence of autism spectrum disorders (ASD) has increased dramatically in the last two decades. While this increase is not well understood, hypotheses range from changing diagnostic criteria to environmental factors. With new research focusing on neural, genetic, and environmental causes, there is a need to extract new types of data from patient records. Much of this data, when it does exist, is contained in free-text notes and is not readily available for research unless manually extracted. Natural language processing (NLP) can transform unstructured information into computable discrete data elements. NLP algorithms designed specifically for the ASD population can make data analysis and integration with other sources possible.

This study will develop NLP algorithms to annotate free text with ASD criteria from the Diagnostic and Statistical Manual of Mental Disorders (DSM) to allow for automatic extraction. The investigators will create models to automatically extract relevant structured data from text in electronic health records (EHRs) of children who have been assessed for ASD. The project addresses a gap in EHR use in mental health, where free text is of enormous importance due to the complexity of diagnosis and treatment, since children with ASD demonstrate drastically variable behaviors that qualify for the same DSM criteria. It offers a potential means of improving early detection and treatment, which may improve outcomes, and provide large-scale phenotypic data to understand, prevent, and cure ASD.

The specific aims of the project are as follows:

  • Design NLP algorithms to create human-interpretable models that automatically annotate free text in electronic records and match to criteria in the DSM for ASD. 
  • Demonstrate the feasibility and usefulness of the models for new research projects. 

Researchers will use NLP algorithm outputs in combination with machine learning to create automated methods for extracting ASD diagnostic patterns. The algorithms will be designed and evaluated by leveraging data from EHRs available through the Arizona Developmental Disabilities Surveillance Program. In addition to evaluating model efficiency, investigators will assess their usefulness for creating data, testing other hypothesis, and their potential to further research in the ASD field.

This project has the potential to shift away from the current paradigm of attempting to understand ASD by relying on small-scale data from individual interventions and a lack of integration between different data sources, to leveraging information from existing large-scale data sources to propose novel analyses and hypotheses. While the project focuses on ASD, the algorithms can be modified for other mental health diagnostic rules.

This project does not have any related annual summary.
This project does not have any related resource.
This project does not have any related survey.
This project does not have any related project spotlight.
This project does not have any related survey.
This project does not have any related story.
This project does not have any related emerging lesson.