Laboratory analysis
Laboratory values were obtained as part of standard patient care. Tests for 24-hour urine protein, protein-to-creatinine ratio, serum creatinine, albumin, haemoglobin, white blood cell count, platelet count, anti-dsDNA, C3 and C4 were performed by Clinical Laboratory Improvement Amendments-certified central laboratories at MUSC, LabCorp or external hospital laboratories. The estimated glomerular filtration rate (eGFR) was determined by using the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation.6
Pathology analysis
Renal biopsies were read by one of two renal pathologists at MUSC (SES or ETB) using the 2018 revised ISN/RPS activity (0–24) and chronicity (0–12) index elements (each scored 0–3 for 0%, <25%, 25%–50% and >50% involvement).5 Included in the activity index are endocapillary hypercellularity, karyorrhexis, fibrinoid necrosis, hyaline deposits, cellular or fibrocellular crescents and interstitial inflammation. Scores were doubled for crescents and necrosis. Included in the chronicity index were total glomerulosclerosis score, fibrous crescents, tubular atrophy and interstitial fibrosis.
Statistical analysis
The outcome variable was the failure to completely respond to therapy at approximately 1 year. This time point was chosen, as variables of response at 1 year are predictive of long-term response years in the MAINTAIN and Euro-Lupus Nephritis Trials.8 Response was defined by a modification of the ACR response criteria.9 The modifications to these criteria were described previously by Wofsy et al.10 Briefly, this modified complete response includes attaining a urine protein-to-creatinine ratio of <0.5 at approximately 1 year and achieving an eGFR of 90 or an improvement of at least 15% from baseline. The suboptimal response outcome was defined by lack of achieving complete response as defined above. Thus, the outcome includes non-responders and partial responders. Variables collected in the data included patient sex, age at time of biopsy, proliferative disease (ISN/RPS classes III or IV, Y/N), mesangial disease (ISN/RPS class I or II, Y/N), membranous disease (ISN/RPS class V, Y/N), activity score (0–3), chronicity score (0–3), interstitial fibrosis (0–3), interstitial inflammation (0–3), number of glomeruli evaluated, crescents (number), crescent-to-glomeruli ratio (0–3×2), necrosis (0–3×2), urine protein-to-creatinine ratio, eGFR by the CKD-EPI formula (eGFR, mL/min/1.73 m²),6 serum creatinine (mg/dL), dsDNA (IU), C3 (mg/dL), C4 (mg/dL), white blood cells count (k/µL), platelet count (k/µL), haemoglobin (g/dL), serum albumin (mg/dL), prednisone (Y/N), hydroxychloroquine (Y/N), mycophenolate mofetil/mycophenolic acid (Y/N), cyclophosphamide (Y/N), rituximab (Y/N), azathioprine (Y/N) and number of medications. Since the data were retrospective and not prospectively randomised, immunosuppressants used for induction are subject to bias by indication and were not considered predictive. They were excluded from consideration during model development for clinical use, as their presence might imply that choice of induction therapy based in the modelling might affect the outcomes. Descriptive statistics were calculated for all participant characteristics by treatment response category. Univariate associations between all baseline characteristics and treatment response were evaluated using a series of logistic regression models.
The goal for this study was to identify a parsimonious subset of predictors from patient demographics and baseline laboratory and biopsy data that yielded good prediction performance characteristics for a set of multivariable prediction models of suboptimal response at approximately 1 year. Multivariable classification models considered in this study included logistic regression (LR), classification and regression trees (CART), random forest (RF), support vector machines with linear, polynomial and Gaussian kernels (SVML, SVMP and SVMR, respectively), naïve Bayes (NB) and artificial neural networks (ANN). RF models were fit using the ‘randomForest’ package; LR models were fit using the ‘stats‘ packages; SVMs and NB models were fit using the ‘e1071’ package; ANNs models were fit using the ‘nnet’ available in R.11 12 Tuning parameters for the different models considered were selected before developing the models. An initial exhaustive search of all combinations of up to 20 variables was considered. However, the results from this search found the best average performance occurred when models included eight variables (online supplemental figure 2). Thus, variable selection was conducted using an exhaustive examination of all subsets of eight or fewer predictors. This threshold number was considered for all modelling approaches to make the models more useful in a busy clinical setting. Specifically, prediction performance for each model’s subset of predictors was conducted using a 10-fold cross-validation (CV) approach. Ten-fold CV divides the data into 10 subsets. Models are trained using 9/10 of the data and tested on the remaining 1/10 of the data, and this is repeated for each subset. The cross-validated area under the curve (cvAUC) is the average AUC calculated for each subset of 1/10 of the data excluded during model development and has been shown to be more robust than use of a single training-test set approach.13 The goal was to identify a small subset of predictors with good prediction performance across the models.14 Prediction performance was measured by 10-fold cvAUC and the best subset of eight variables was selected as the subset that resulted in the highest average cvAUC across all models. Sensitivity, specificity, positive predictive value and negative predictive values were determined for select thresholds for the predicted probability of non-response returned by each model. All analyses were conducted in R V.4.0.2. An R-Shiny, web-based tool was created based on the resulting models that were selected.