Article Text
Abstract
Objective Renal flares in patients with systemic lupus erythematosus (SLE) result in significant nephron loss. Thus, identification of reliable early signals of impending renal flares is anticipated to improve the prognosis for these patients. In this study, we implemented two different approaches of machine learning (ML) algorithms to identify baseline clinical determinants of renal flare occurrence in a large cohort of SLE.
Methods We analysed data from five phase III trials (BLISS-52, BLISS-76, BLISS-NEA, BLISS-SC, EMBRACE) after excluding patients with baseline British Isles Lupus Assessment Group (BILAG) A or B (N=3169). Renal flare was defined as a change from C, D, or E to A or B in the renal domain of the classic BILAG index within a 52-week long follow-up. Following construction of panels of variables using either (i) knowledge or (ii) feature selection methods, we developed ML classifiers including extreme gradient boosting (XGBoost), least absolute shrinkage and selection operator (LASSO), random forest (RF), and logistic regression. A stratified split was applied to partition the study population into a training (90%; N=2853) and a test set (10%; N=316). The training set was used in model development while the internal validation was developed by 10-fold cross validation. The test set was used for validation of the built model. Both approaches yielded final models that utilised the minimal number of features while maintaining optimal performance.
Results Of 3169 patients, 899 (28.3%) developed a renal flare during follow-up. XGBoost yielded the greatest accuracy both in the hypothesis-driven (0.97) and data-driven approach (0.88), as well as the highest performance metrics (AUC: 0.97 and 0.91; sensitivity: 1.00 and 0.82; specificity: 0.94 and 0.94, respectively) and an adequate calibration on the test dataset. The final model successfully reduced the number of features to five: renal BILAG C or D, urine protein creatinine ratio, serum albumin, blood urea nitrogen, and C3 levels.
Conclusions The knowledge-driven approach outperformed the data-driven approach which solely relied on feature selection methods. Our data suggests that the utilisation of five routine clinical parameters (proteinuria, albuminaemia, urea, C3) could be combined into accurate tool for predicting renal flares in SLE patients.
Conflicts of interest IP has received research funding and/or honoraria from Amgen, AstraZeneca, Aurinia, Bristol Myers Squibb, Elli Lilly, Gilead, GlaxoSmithKline, Janssen, Novartis, Otsuka, and Roche. The other authors declare that they have no conflicts of interest related to this work. The funders had no role in the design of the study, the analyses or interpretation of data, or the writing of the manuscript.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ .