Article Text
Abstract
Background The development of lupus low disease activity state (LLDAS) as a treat-to-target endpoint for SLE patients has been validated. Its attainment has been associated with improved outcomes. This study aims to show whether a machine learning model can yield good results in predicting whether a patient will achieve LLDAS on their succeeding assessment.
Methods A total of 42,355 records of patients were retrieved from the APLC longitudinal study database. Three machine learning models – XGBoost, Random Forest, and Naive Bayes – were tested for their predictive power. Eighty percent of the data was used to train the models while thirty percent was used for validation. The data were normalized and all models were subjected to 10-fold cross-validation to prevent overfitting. Additionally, we compared the top ten most significant features of each model.
Results Various metrics were used to measure the model’s predictive power. The results of our study showed that the Random Forest model scored the highest for specificity, PPV, and accuracy with 0.8450, 0.8182, and 0.8338, respectively. The XGBoost model topped the NPV metric with 0.8559 while the Naive Bayes model got the highest score for sensitivity with 0.8986. It is good to note that the score difference of Random Forest with the top sensitivity and NPV scores were only 0.0629 and 0.0085, respectively.
For the significant features, only two features were present on all three models, namely the current LLDAS and proteinuria level. Three additional features were important for two models—whether the patient is taking prednisolone; time adjusted mean (TAM) SLEDAI score; and SLEDAI score.
Conclusions The study showed and compared various machine learning models on their predictive power in determining whether a patient will achieve LLDAS on their next visit. The results determined that the current LLDAS, proteinuria levels, SLEDAI score (and TAM SLEDAI),
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.