Back to Resources

NLP Methods for Extraction of Symptoms from Unstructured Data

This blog article summarizes a study co-authored by Q-rounds cofounder John Sartori, Ph.D., and a team of researchers.

The COVID-19 pandemic introduced unprecedented challenges for the healthcare industry. Striking a balance between providing top-notch care and efficiently allocating vital resources became more necessary than ever.

Clinicians needed accurate tools to allocate essential medical resources to patients who needed them most. This prompted a study to determine if patient symptoms and statistical modeling could enhance clinical decision making, whose results showed promise for both in-hospital and telehealth settings. 

Deciphering Patient Notes

Because patient symptoms are typically found in unstructured notes, they’re not readily available for clinical decision making. To fill this gap, two methods were compared for symptom extraction from Emergency Department (ED) admission notes. 

Both methods utilized a lexicon derived by expanding The Center for Disease Control and Prevention’s (CDC) Symptoms of Coronavirus list. The first approach leveraged a word2vec model to enrich the lexicon through dictionary mapping to the Unified Medical Language System (UMLS). The second method utilized the expanded lexicon as a rule-based gazetteer—a dictionary of terms derived from a given lexicon—in tandem with UMLS. 

Identifying Salient Risks

Significant risks associated with specific symptoms among COVID-19 patients were discovered. Specifically, patients presenting with dyspnea were identified as having an increased risk of in-hospital mortality (OR 1.85, p-value < 0.001). 

Uncovering Language Disparities

A concerning finding of opposing risk signals between fatigue and in-hospital mortality was also discovered (non-English: OR 1.95, p-value = 0.02; English: OR 0.63, p-value = 0.01). While non-English speaking patients faced a higher risk, English speaking patients experienced a protective effect, underscoring the necessity of tailoring care based on a patient’s linguistic background.

Rounding it up

Statistical modeling of outcomes based on a patient’s presenting symptoms (symptomatology) can help deliver high quality care and allocated essential resources, which is especially important during the COVID-19 pandemic. 

And unlike previous studies, it was shown that models built using symptoms with the outcome of in-hospital mortality weren’t significantly different from models using data collected during an in-patient encounter using only vital signs.

This indicates that prognostic models based on symptomatology could aid in extending COVID-19 patient care through telemedicine, replacing the need for in-person options and enhancing clinical decision support for remote healthcare providers.

Q-rounds reduces provider decision fatigue by using AI to learn provider preferences and suggest an optimal rounding order, and enables either families or providers to request interpreter services, removing language barriers so everyone understands the important conversations taking place during rounds. It also lets families attend rounds virtually, making it easier to have more holistic discussions about symptoms and other potential contributing factors to whatever the patient is experiencing.

Interested in learning more? Read the full study or see how Q-rounds can help.