Abstract CE-50 Table 1

Performance of traditional structured definitions and supervised machine learning algorithms in case identification of SLE in the electronic health record

Structured DefinitionsSupervised Machine Learning
Recall Precision Recall Precision
(Sensitivity) Specificity (PPV) (Sensitivity) Specificity (PPV)
Single ICD9 710.00.990.970.79All ICD-9 codes and counts1 0.890.990.89
Single ICD9 710.0
+ any lupus medication
0.960.980.86All ICD-9 codes and counts
+ NLP of clinical notes2
0.900.990.89
Single ICD9 710.0
+ any lupus medication
+ any positive lupus-serology
0.930.980.87All ICD-9 codes and counts
+ NLP of clinical notes
+ all serologic data3
+ all medication data
0.910.990.92
All ICD-9 codes and counts
+ NLP of clinical notes
+ all serologic data
+ all medication data
+ demographics4
0.850.990.96
  • 1 Supervised Machine Learning algorithms included all available ICD-9 codes for patients as well as counts and locations in the medical records in which they were found (i.e. clinical encounters, problems lists, medications orders, etc.)

  • 2 All text data from clinical notes associated with a patient’s medical record were included in the ML algorithm

  • 3 Serologic data included ANA, double-stranded DNA, anti-Smith antibody, anti-RNP, SSA, and SSB

  • 4 Demographic information included age, gender, race/ethnicity, insurance status, and employment status