Article Text

Download PDFPDF

187 Application of text mining methods to identify lupus nephritis from electronic health records
  1. Milena A Gianfrancesco1,
  2. Suzanne Tamang2,
  3. Gabriela Schmajuk3 and
  4. Jinoos Yazdany4
  1. 1Division of Rheumatology, Department of Medicine, University of California, San Francisco
  2. 2Center for Pppulation Health Sciences, Stanford University
  3. 3University of California, San Francisco; San Francisco VA Medical Center
  4. 4UC San Francisco


Background Lupus nephritis (LN), or chronic inflammation of the kidneys, is a frequent complication of SLE and associated with higher overall morbidity and mortality. Previous studies have found that LN affects individuals of Hispanic, African American and Asian descent more than those of European ancestry. Accurate estimates of LN in the population remain limited due to the inability to capture this information through structured data fields in electronic health records (EHRs). We aimed to develop a text mining pipeline to extract information on LN diagnosis within clinical notes in the EHR of a large, diverse university health system.

Methods Individuals with a single diagnosis code for SLE (710.0) during June 1, 2012 November 5, 2016 from the EHR of a university health system were included (n=2,509). All clinical notes for patients were extracted and annotated using a clinical text-mining tool, the Clinical Event Recognizer (CLEVER), and a custom-built dictionary that included lupus nephritis and associated terms. Positive and negative mentions of LN were tagged and evaluated for performance.

Results Over 1300 positive and 400 negative mentions of LN were detected from clinical notes. Manual review of 50 note mentions (25 positive and 25 negative) determined that our text-mining tool detected LN with 79% sensitivity and 86% specificity (table).

Abstract 187 Table 1

Number of positive and negative mentions of lupus nephritis identified through the text-mining tool and manual review.

Conclusions We conducted the first text-mining strategy to extract LN status from clinical notes. Additional evaluation of the LN text-mining pipeline on a gold-standard chart-reviewed cohort of SLE patients (n=332) is ongoing. Further refinement of the pipeline will allow us to classify stage of LN (Class I-V) to better phenotype SLE severity, and conduct follow-up studies to determine factors associated with this important disease outcome.

Funding Source(s): NIH-NIAMS F32 AR070585 (Gianfrancesco)

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.