Article Text
Abstract
Background Systemic Lupus Erythematosus (SLE) is a systemic autoimmune disease with diverse manifestations that can occur over a long period of time. Electronic health record (EHR) data presents a rich source of information that can be used to understand the varied presentation of SLE. We examined three clinical classification criteria for SLE to assess whether they could be a foundation for phenotype-based detection of patients with SLE in EHR data.
Methods We assessed algorithm performance over 600 medical records from the Northwestern Medicine Electronic Data Warehouse, 472 of which had definite SLE and 128 which did not, based on chart review. We developed algorithms, based on the American College of Rheumatology (ACR), Systemic Lupus International Collaborating Clinics (SLICC) and the proposed European League Against Rheumatism/American College of Rheumatology (EULAR/ACR) classification criteria using only structured data elements (diagnosis codes (ICD9/ICD10) and lab results) to determine whether patients met the classification criteria for definite SLE.
Results As shown in table 1, the overall identification rate of SLE ranged from 58% to 78% across the three algorithms. All three criteria-based algorithms had greater than 95% specificity and greater than 98% positive predictive value (PPV). Sensitivity of the algorithms ranged from 52% to 69% and negative predictive value (NPV) from 35% to 55%. The SLICC-based algorithm had the overall highest performance, detecting 78% of the patients with definite SLE as determined by chart review, with 99% PPV, 69% sensitivity, 98% specificity and 55% NPV.
Conclusions The ACR-, SLICC- and proposed EULAR/ACR- based EHR algorithms all detect a significant proportion of patients that were classified as having definite SLE by chart review, with high PPV and specificity. Low NPV of all three algorithms likely reflects undetected cases of SLE resulting from low detection of clinical and laboratory criteria (such as arthralgia and ANA tests) that are not consistently documented in structured data in the medical record. Use of structured data improves portability of the algorithms to other EHR datasets, but may have reduced the ability of the algorithms to detect important/highly weighted classification criteria that are documented primarily in free text notes. All three algorithms may improve through use of natural language processing (NLP) of physician notes for criteria that were difficult to detect using only diagnosis codes and labs, but may reduce portability as a result of the customization required for NLP to be effective.
Funding Source(s): National Institute of Arthritis and Musculoskeletal and Skin Diseases (R21 AR072263) and NHGRI U01HG006388