Article Text

Download PDFPDF

1010 Development of a scoring system for accurate lupus nephritis case identification from real-world databases
  1. Zara Izadi1,
  2. Alfredo Aguirre1,
  3. Christine Anastasiou1,
  4. Julia Kay1,
  5. Gabriela Schmajuk1,2 and
  6. Jinoos Yazdany1
  1. 1University of California San Francisco, San Francisco, CA
  2. 2San Francisco VA Medical Center, San Francisco, CA

Abstract

Background Accurate identification of prevalent cases of lupus nephritis (LN) is essential for timely patient monitoring and treatment, advancing research, and informing public health initiatives for the management of LN. However, diagnosis codes for LN are generally underutilized, making identification of this patient population in real-world databases challenging. We developed a scoring system to quantify the probability of accurate LN case identification using structured data from electronic health records.

Methods We used data from EHRs of two large health systems and included patients with ≥1 ICD9/10 codes for SLE from June 2012 to Jan 2022. Prevalent LN was defined as current active LN or a history of LN. We used regular expressions with negation to loosely tag LN within EHR notes, in a training set consisting of a balanced sample of 2038 patients from the larger health system. Testing sets included 100 patients randomly selected from each health system and were manually chart reviewed to classify patients as having ‘no LN’, ‘definite LN’ (biopsy report of Class III, IV or V LN), ‘potential LN’ (no biopsy report but physician diagnosed LN), and ‘diagnostic uncertainty’ (physician states LN is possible). A gradient boosting model (GBM) including 42 predictors that covered demographics, encounters, diagnosis and procedure codes, comorbidities, medications, and laboratory test results (e.g., serologies, urine studies, chemistries) was used for predictor selection. Predictive performance of a logit regression model (LRM) including key predictors from GBM was evaluated for identifying patients with a ‘strict’ (definite LN) or an ‘inclusive’ (definite LN, potential LN, or diagnostic uncertainty) definition of LN. A LRM-based scoring system was developed and calibrated.

Results Table 1 includes demographics of the 4,522 patients meeting the eligibility criteria from both health systems. In addition to more specific diagnosis codes for LN, presence of diagnosis codes for acute or chronic kidney disease or proteinuria, younger age at first SLE diagnosis code, and use of mycophenolate mofetil or mycophenolic acid were identified as key predictors in the GBM. Urine protein creatinine ratios (UPCR) >0.5, abnormal complement component 3 (C3) levels, any use of hydroxychloroquine, azathioprine, or rituximab, and glucocorticoid dose were also identified as important predictors but were omitted from the final LRM as their inclusion did not further improve performance.

Abstract 1010 Table 1

Characteristics of the underlying population.

The final LRM had an area under the curve, sensitivity, and positive predictive value of 0.93, 0.88, and 0.84, respectively, for identifying LN using the inclusive definition, performed similarly with a strict LN definition, and had good external validity when tested in the second health system (table 2). Predicted and observed probabilities had good calibration (table 2). The scoring system was derived from this model (table 3).

Abstract 1010 Table 2

Performance of the final logit regression model including key predictors.

Abstract 1010 Table 3

The scoring system and interpretation.

Conclusions Prediction of prevalent LN using data elements available in EHR or claims data was feasible, had good accuracy and was validated externally. With further validation, the scoring system has the potential to identify prevalent LN accurately across health systems, addressing the current challenge of LN case identification using ICD10 codes.

Disclaimer: Aurinia Pharmaceuticals provided an unrestricted grant for this work.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.