Article Text

Systemic lupus in the era of machine learning medicine
  1. Kevin Zhan1,
  2. Katherine A Buhler1,
  3. Irene Y Chen2,3,
  4. Marvin J Fritzler1 and
  5. May Y Choi1,4
  1. 1University of Calgary Cumming School of Medicine, Calgary, Alberta, Canada
  2. 2Computational Precision Health, University of California Berkeley and University of California San Francisco, Berkeley, California, USA
  3. 3Electrical Engineering and Computer Science, University of California Berkeley, Berkeley, California, USA
  4. 4McCaig Institute for Bone and Joint Health, Calgary, Alberta, Canada
  1. Correspondence to Dr May Y Choi; may.choi{at}ucalgary.ca

Abstract

Artificial intelligence and machine learning applications are emerging as transformative technologies in medicine. With greater access to a diverse range of big datasets, researchers are turning to these powerful techniques for data analysis. Machine learning can reveal patterns and interactions between variables in large and complex datasets more accurately and efficiently than traditional statistical methods. Machine learning approaches open new possibilities for studying SLE, a multifactorial, highly heterogeneous and complex disease. Here, we discuss how machine learning methods are rapidly being integrated into the field of SLE research. Recent reports have focused on building prediction models and/or identifying novel biomarkers using both supervised and unsupervised techniques for understanding disease pathogenesis, early diagnosis and prognosis of disease. In this review, we will provide an overview of machine learning techniques to discuss current gaps, challenges and opportunities for SLE studies. External validation of most prediction models is still needed before clinical adoption. Utilisation of deep learning models, access to alternative sources of health data and increased awareness of the ethics, governance and regulations surrounding the use of artificial intelligence in medicine will help propel this exciting field forward.

  • Lupus Erythematosus, Systemic
  • Risk Factors
  • Epidemiology
  • Lupus Nephritis
http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

  • Most machine learning models developed for SLE to date have been directed towards elucidating disease pathogenesis, improving diagnosis, and predicting disease-related outcomes.

WHAT THIS STUDY ADDS

  • This study provides an overview of machine learning techniques to discuss current gaps, challenges, and opportunities for SLE research.Most SLE machine learning studies under-report key details of the model development and/or have not been externally validated to ensure they are effective, reliable, and safe to adopt into clinical practice.

  • The application of more advanced machine learning algorithms such as deep learning and the utilisation of complex, alternative datasets including images, are increasing among SLE studies.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

  • As machine learning continues to provide unprecedented opportunities to deliver transformative discoveries in SLE research and practice, researchers need to stay informed of the ethical, governance, and regulatory considerations around their use.

Introduction

Tremendous progress in our understanding of SLE pathogenesis, diagnosis and management has been made over the past 75 years, with most studies relying on traditional statistical techniques to evaluate and test hypotheses. While these approaches are still widely used, many researchers are turning to machine learning (ML) as a complementary method for assessing patterns that are not readily tested using traditional statistics. In the last 5 years alone, there has been an explosion of studies that have leveraged the power of ML to study SLE patient identification, risk prediction, diagnosis, disease subtype classification, progression, outcomes, monitoring and management. While it may seem that ML is the new shiny toy of the 21st century, the term ‘artificial intelligence’ (AI) was first described in 1955, the same year that antimalarial drugs were approved by the US Food and Drug Administration. The impact of AI on medicine has recently re-emerged as a valuable approach because of the enormous growth in computing power and increasing availability of extensive and comprehensive ‘big data’ for analysis. As SLE researchers continue to amass more data on SLE, a complex, multifactorial and heterogeneous disease, traditional statistical techniques may no longer be the most effective or efficient methods, particularly in this era focused on precision medicine. In this review, we will provide an overview of ML and its current and future potential applications to SLE research.

Why ML in SLE?

Although ML and AI are often used interchangeably, ML is a subset of AI (figure 1). AI is the development of machines and systems that can imitate tasks that normally require intelligent human behaviour. ML algorithms allow computers to perform specific tasks by learning from the data rather than being explicitly programmed with instructions such as traditional statistical tests. Some other important differences between ML and traditional statistics are described in table 1. Understanding the advantages and disadvantages of both approaches may help inform one’s decision on which methods to use. In general, if the purpose of a project is to create an algorithm that can make predictions for a particular outcome and a large dataset is available, an ML approach may be a better option. If the purpose is to examine a relationship between variables or make inferences from a smaller dataset, then a traditional statistical model may be the better approach.

Figure 1

Categories of machine learning. Machine learning is a type of artificial intelligence. Within machine learning, there are three main categories: supervised, unsupervised and reinforcement learning. Deep learning is a subtype of machine learning that can involve supervising, unsupervised or reinforcement learning. Within each category, there are many different types of machine learning algorithms. Many factors can influence the choice of a specific algorithm. These include amount and type of data (eg, if images or videos are included in the data, a neural network will probably be preferred); how important interpretability is to your context (eg, decision trees or regression models are typically more interpretable, although this is an active area of research); and any computer memory or computational restriction. As no particular model consistently performs better than the others, it is typical to develop several models using multiple algorithms and then compare their performance using different metrics. CNN, convolutional neural network; CVD, cardiovascular disease; LASSO, least absolute shrinkage and selection operator; RNN, recurrent neural network.

Table 1

Key differences in machine learning and traditional statistical approaches

In this technological age, researchers have greater access to large datasets of different types of information on patients with SLE. Types of datasets in SLE include demographic, clinical, histological, genetic and immune-related biomarkers (eg, autoantibodies, immune cell types, cytokines) in biological fluids, electronic medical records (EMR), images (eg, MRI, ultrasound) and other ‘omics’ (eg, proteomics, metabolomics). While this presents an important opportunity to study a remarkably heterogeneous and complex disease like SLE, the volume and density of data can also make it challenging to draw statistical inferences from large datasets, especially given the potential to identify false positive associations. Hence, ML is a more efficient and accurate approach to understanding the patterns in complex datasets.

The more technical aspects of ML as they apply to systemic autoimmune rheumatic diseases are reviewed in greater detail elsewhere.1–5 In brief, the ML categories that are often applied to study medical data are supervised and unsupervised. In supervised ML, or a task-driven approach, a ‘training dataset’ is used to develop an algorithm to recognise patterns that are associated with ‘labels’. This algorithm is then tested in a ‘test dataset’ to see how well it performs. In unsupervised ML or a data-driven approach, the training data are ‘unlabeled’, and the algorithm attempts to identify patterns within the dataset. In addition to supervised and unsupervised ML, another less commonly applied type of ML is called reinforcement learning. This type of ML is based on trial and error, with ‘reward’ or ‘punishment’ driving the learning process and skills acquisition. Within the three ML categories, a variety of ML algorithms exist, such as deep learning algorithms based on artificial neural networks (ANN), a modality that involves multiple layers of connected data, which can recognise complex patterns across different types of data, including images, video, and acoustic data. As we will discuss later, most ML studies in SLE employ both supervised and unsupervised models.

To determine which ML model to use, researchers consider several important factors including the characteristics of the input data (labelled vs unlabelled), the desired outcome (predicting a category or quantity), the modality of the data (eg, text, image) and volume of input data (figure 1). It is common to employ several algorithms and then compare their performance using different metrics to select the best model. For supervised models, it is ideal to assess the sensitivity, specificity, positive predictive value, negative predictive value, accuracy, F-score and area under the receiver operating characteristic curve (AUC), although particular emphasis may be placed on a subset of metrics depending on the context. The F-score is a single metric that combines the sensitivity and the positive predictive value of a model, and a high F-score requires good performance by both of those metrics. In traditional statistics, this is often referred to as the accuracy or line of best fit, but with ML, the F-score may be better suited to assess the training of a model. For unsupervised clusters, several techniques exist to ensure the number of identified clusters accurately reflects the data. These include the elbow method,6 the Bayes information criterion7 or a gap statistic.8 Once satisfied, statistical differences between clusters can be assessed using traditional methods, such as χ2 tests or analysis of variance.

Building and evaluating the ML models occur as the final steps of an established ML pipeline (figure 2). After the data are collected, it is preprocessed (data cleaning, filling in missing data, etc), followed by data splitting, feature importance evaluation and selection, and then finally the ML models are built and evaluated. Feature selection is a process that allows the researcher to identify the best set of features that will help build optimised ML models (reviewed in ref 9). Feature selection is typically used with supervised algorithms, while dimensionality reduction is used in unsupervised clustering. Reports often use multiple supervised and unsupervised feature selection methods together. Examples of feature selection include recursive feature elimination,10 least absolute shrinkage and selection operator (LASSO)11, and support vector machines (SVM).12 These methods help identify covariables that are of greatest clinical and statistical importance.

Figure 2

Machine learning pipeline with consideration of the ethical, governance and regulation issues at every stage before clinical adoption of the model.

ML reports in SLE

A scoping review was performed to summarise the major ML reports of SLE to date. A PubMed search of ‘lupus’ and ‘machine learning’ Medical Subject Heading terms was performed on 24 November 2023 (figure 3). One hundred and ninety-one publications from 1992 to 2023 were identified, of which 133 were original reports. The remaining publications were review articles or unrelated topics (eg, not SLE, non-human, not ML). Over the last 31 years, there has been an exponential increase in the number of ML and SLE-related publications, similar to trends reported in other autoimmune rheumatic diseases.1 5 13 As this was not a systematic review, we acknowledge that we may have omitted some studies related to ML and SLE. However, we believe that we have captured most publications allowing for an accurate representation of the field and an in-depth discussion in our paper.

Figure 3

Number of SLE-related studies using machine learning methods. There has been an exponential growth of reports over the past 31 years based on PubMed database of publications when we searched ‘machine learning’ and ‘lupus’. The majority of reports were related to diagnosis (including neuropsychiatric and dermatological manifestations), followed by disease activity (including renal flares, extrarenal flares and treatment response), complications, pathogenesis, and mixed reports.

As ML research becomes increasingly recognised and valued in SLE, it is imperative that it is conducted in a methodologically rigorous manner to yield meaningful and useful results to relevant stakeholders and end users. Since ML methods are relatively new to the field, assessing the quality or technical aspects of these reports may be challenging to most non-ML researchers. A recent systematic review by Munguía-Realpozo et al14 assessed 45 SLE reports that used ML to build diagnostic and/or predictive algorithms and determined whether they adhered to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting standards.15 The review concluded that most reports were deficient in multiple domains of the TRIPOD recommendations, often under-reporting relevant details about their data preprocessing, model-building process, model specification and model performance.

In this scoping review, we will discuss ML approaches used in SLE reports following the outline of an ML pipeline (figure 2). While the aim of the study was not to systematically evaluate the reporting adherences of these reports, in general, we found similar limitations identified by Munguía-Realpozo et al.14 This highlights that there is a need to improve transparency and reporting of prediction models in future ML SLE studies.

Data collection

Given that SLE is an uncommon disease, it was not unexpected that the sample sizes for most reports (median 158 patients with SLE (IQR 61–681)) were relatively small. Overfitting and inappropriate generalisation from a small training dataset are important limitations of ML.16 Twenty-five (18.7%) reports evaluated greater than 1000 patients and seven (5.2%) reports assessed greater than 5000 patients. Most of these larger reports used EMRs and administrative databases to identify patients with SLE, recognising that these types of data may be limited by diagnostic misclassification.17–21 Many reports experienced ‘class imbalance’, where the SLE group sample size was considerably smaller compared with healthy controls, potentially biasing ML in favour of the more prevalent class. To address this, some reports used generative adversarial networks22 23 and Synthetic Minority Oversampling TEchnique (SMOTE)20 24 to generate synthetic data.

The data density for most SLE reports does not derive from the patient cohort size but from the large number of variables on each patient considered for the ML models. Types of data used included demographic (n=43 reports) and clinical (n=51) data from cohort registries and several using EMRs (n=13). Data from biopsies included renal (n=6) and lymph node tissue (n=1). Biomarker data included autoantibodies (n=37), immune cell subtypes (eg, CD4+ and CD8+ T cells) (n=26) and other immune markers (eg, complement levels, platelet counts) (n=24), cytokines (n=8), genetics and transcriptomics (n=47), urinary markers (n=9), proteomics (n=5) and lipidomics/metabolomics (n=11). The application of ML to genetics and transcriptomics (eg, RNA sequencing (RNA-seq)) is particularly popular, largely due to the flexibility of ML for managing the vast amount of data obtained from each patient. The feature selection and dimensionality reduction techniques of ML offer a means to handle the large number of potentially relevant covariables. Alternative datasets included images (brain MRI for neuropsychiatric SLE (NPSLE) (n=9), clinical images of cutaneous lupus erythematosus (LE) (n=1) and funduscopic images for lupus retinopathy (n=1)), EKG abnormalities (n=1) and meteorological/environmental indicators (eg, air humidity, air pressure, sulfur dioxide, nitrogen dioxide, particle pollution from fine particulates) (n=2).

Data preprocessing and splitting

As identified by Munguía-Realpozo et al,14 handling of missing data was a major limitation of SLE reports. Median imputation and removal of data to use complete cases were common methods. Four reports used multiple imputation by chained equations,25 a more advanced imputation methodology, and six reports used SMOTE24 to address class imbalance with respect to missing data. Some ML models such as extreme gradient boosting (XGBoost)26 are able to address missing data due to built-in imputation functions.

Accurate data labelling is particularly important for diseases that are heterogeneous with a fluctuating and variable disease course, such as SLE. Identification of SLE cases using EMRs may be inaccurate and inefficient as it relies on coding systems such as the International Classification of Diseases (ICD), which historically has poor diagnostic specificity.27 Similarly, identification of SLE-related manifestations may be challenging given the wide range of features and lack of specific administrative codes for different phenotypes of presentation. Some manifestations are difficult to distinguish between primary and secondary features of SLE, for example, NPSLE versus secondary to other conditions (eg, infections or metabolic disturbances). Regardless of whether a model was developed through traditional means or by ML, any errors in data labelling in the preprocessing stage that are then used to train the model will continue to mislabel future cases. To overcome this, one ML SLE study used a technique called ‘noisy labeling’, where the training labels were created using EMR data based on a threshold of multiple ICD-9 codes, followed by model testing against expert clinician-labelled data with good performance metrics.28

Most reports split a single dataset into three groups: training, validation, and a testing set. While this is an acceptable approach to internally validate a model, an external validation dataset with an independent cohort of patients is needed to ensure replicability and generalisability of the model before clinical adoption and to assess the degree of potential model overfitting.29 We discuss external validation separately below.

Feature selection and dimension reduction

Feature selection methods were primarily random forest (RF) (n=41), followed by LASSO (n=21), and SVM (n=16). Several reports also used filter methods such as relief-based feature selection (n=7) and mutual information (n=2), which were often performed in reports that used genetic datasets. Dimensionality reduction techniques were applied (n=32), which included principal component analysis (PCA) (n=19), t-distributed stochastic neighbour embedding30 (n=9), and Uniform Manifold Approximation and Projection31 (n=7).

Model development

Most reports (n=102) developed one or more prediction algorithms. The remaining reports (n=31) focused only on the identification of SLE clusters or features, for example, biomarkers. For supervised models, the most common technique was RF (n=49), followed by SVM (n=42), logistic regression (LR) (n=42), ANNs32 (n=24), XGBoost (n=20), LASSO (n=17), decision trees (n=16), Naïve Bayes33 (n=14), and k-nearest neighbour (n=13). A few reports used a gradient-boosted tree,34 classification and regression tree35 and light gradient-boosting machine.36 For unsupervised models, primarily clustering and dimensionality reduction were performed, for example, hierarchica (n=9) and k-means clustering (n=9).

There was an increasing number of SLE reports using deep learning methods over time; in this review, 34 such reports were identified. Most of the reports (n=23) included a simple neural network with one or two hidden layers as a comparison between other techniques. As little hyperparameter optimisation was done, these ANNs often were outperformed by models such as RF, SVM and XGBoost. Even with tasks such as natural language processing which commonly use deep learning models like recurrent neural networks (RNN),32 one study found that RF outperformed deep learning models when proper preprocessing and feature selection were performed.37

RNN and its derivatives (long short-term memory (LSTM)38 and gated recurrent unit)39 are typically used in natural language processing, time series data and large image data. In terms of SLE reports, these models were used to analyse EMR data for hospitalisation risk20 40–42 and image data for SLE diagnosis.43 44 However, no report in our review used large language models and attention to text data was uncommon, highlighting the need for more complex models in analysing text data from electronic health records.

Five reports used convolutional neural networks (CNN)45 on image data with topics ranging from NPSLE diagnosis from MRI images,46 diagnosis of SLE retinopathy from funduscopic images,23 diagnosis of cutaneous lupus from lesion images,47 segmentation of staining from lupus nephritis (LN) pathology images48 and segmentation of glomeruli on LN biopsy images.43 Three of the reports used a deep learning technique called Grad-CAM49 that identifies the region of an image that will contribute the most to the final model. As SLE imaging data can be challenging to obtain with large enough numbers for robust ML reports, an ML technique called transfer learning was used to create powerful discriminative models, even with sparse data. Four reports in this review used this method to work with the smaller datasets.23 43 47 48 Liu et al23 posited that transfer learning using diabetic retinopathy funduscopic images could serve as a strong base model for lupus retinopathy prediction as the base model would be more ‘accustomed’ to pathological fundus images. This principle could be applied to other areas of SLE research such as the diagnosis of cutaneous LE through images of other skin lesions from similar but more common diseases including psoriasis.

Model evaluation

The approach taken by most reports was to develop multiple ML models and then select the best model, usually based on the AUC. Other performance metrics including the F-score were not always used or even reported. This is similar to the findings by Munguía-Realpozo et al, where only 21 (46.7%) reported AUC as their main performance metric, seven (15.6%) reported accuracy as their performance metric and the remaining used a combination of performance statistics.14 Five (11.1%) reports in their review did not report any performance metrics.

In our review, RF (n=25), SVM (n=16), XGBoost (n=10), LR (n=10) and LASSO (n=7) models were often reported as the best performing models, compared with more complex models like ANNs (n=4), LSTM (n=3) and CNN (n=2). As many of the datasets of the reports included in this review have features on the scale of 100s, we expect that simpler models would perform better compared with gradient boosting and neural networks that require larger datasets, and where performance is enhanced with multiple layers of data. Additionally, models such as RF and LASSO have capabilities for feature importance, which helps with explainability such as identifying important clinical and genetic biomarkers for future research.

External validation

Overfitting of ML models to the training datasets should be evaluated using optimism-adjusted measures. Although these can be approximated using internal validation (eg, data splitting), they are more robustly assessed using external validation from a separate data cohort. This step ensures that the developed ML model is generalisable beyond the collected data alone. Only 15 reports in our review specified that they evaluated their model using an external cohort. External validation is particularly relevant for complex ‘black box’ ML models such as deep learning. In deep learning, the internal processes of the model are usually unknown or ‘hidden’. This makes it difficult to assess whether certain model features could be subject to selection or other biases that may affect the generalisability of the model.50 51

Key SLE findings by ML reports

In our scoping review, ML models were used to elucidate disease pathogenesis (n=31),48 52–81 predict SLE diagnosis and identify cases (n=61),23 28 37 43 44 46 47 53 63 70 73 79 82–130 disease activity and treatment response (n=33),56 59 63 66 69 74 77 78 101 106 113 131–152 complications (n=22)18 21 40 53 83 147 153–168 and healthcare utilisation (n=6)17 20 41 42 142 169 (table 2). Refer to online supplemental table 1 for a glossary of key terms.

Supplemental material

Table 2

Key SLE findings in machine learning studies

Pathogenesis

Among the reports that examined SLE pathogenesis, many used genetic and RNA-seq datasets. Novel markers identified by these reports include ST8SIA4,57 CMTM4,57 C2CD4B,57 LCK,69 cuproptosis-related genes,72 TNFSF13B,79 OAS1,79 ABCB1,81 CD247,81 DSC1,81 KIR2DL381 and MX2.81 Immune-related biomarkers including autoantibodies, immune cell subtypes and cytokines were also analysed. These were often combined with other clinical features to reveal unique SLE endotypes via cluster analysis.62 63 70 74 75 80 Important immune pathways were identified including extrafollicular B cell involvement,54 DNA methylation,65 expansion of major helper T cell subsets and unique proliferating (Ki-67+) immune cell subsets,76 and signalling lymphocytic activation molecule family receptors on peripheral blood mononuclear cells.64

Diagnostic models

SLE diagnostic models were used to identify patients with SLE compared with healthy controls and from other autoimmune rheumatic diseases (eg, rheumatoid arthritis, Sjögren disease, systemic sclerosis, multiple sclerosis), Kikuchi disease and other forms of nephropathy for LN reports23 28 37 43 44 47 53 63 70 79 82–121 (AUC 0.70–0.99). A validated diagnostic algorithm called the SLE Risk Probability Index (SLERPI) was developed using LASSO-LR based on 14 SLE clinical and serological features.84 A SLERPI score of greater than 7 was highly accurate (94.2%) and sensitive for detecting early disease (93.8%) and severe manifestations including kidney (97.9%) and neuropsychiatric involvement (91.8%). There were also specific diagnostic algorithms for LN,85 107 112 117 118 122 NPSLE44 46 100 102 109–111 121 123–130 and cutaneous LE.47 73 Cases of SLE28 98 114 and births from mothers with SLE87 could also be derived from EMR using ML.

Reports using genomic and genetic expression datasets identified several important biomarker for LN including C1QA, C1QB, MX1, RORC, CD177, DEFA4, and HERC5 for LN.118 For non-renal SLE, FOXP3,88 MX2,106 HLA-DQA1,90 HLA-DQB1,90 HLA-DRB1,90 neutrophil extracellular trap-related genes (HMGB1, ITGB2 and CREB5),70 ABCB1,120 IFI27120 and PLSCR1120 have been reported. Other types of biomarkers included proteomics (IFIT3, MX1, TOMM40, STAT1, STAT2 and OAS3),101 metabolomics,91 lipidomics94 and microRNA profiles.108

For the detection of LN, novel serum biomarkers as a form of ‘liquid biopsy’ included circulating cell-free methylated DNA.117 For NPSLE, different T and B cell subsets predicted depression in patients with SLE.127 128 Proteomics using cerebrospinal fluid demonstrated that CST6, L-selectin, Trappin-2, KLK5 and TCN2 could distinguish NPSLE from SLE controls (non-NPSLE).124 Other reports using single-cell RNA sequencing data compared biomarkers for NPSLE to multiple sclerosis103 and vascular dementia.83 To differentiate cutaneous LE from other dermatological disorders such as psoriasis, eczema, atopic dermatitis and systemic sclerosis (RF model, AUC 0.774–0.990), interferon gene signature, tumour necrosis factor, interleukin-23 (IL-23), interferon (IFN), IL-12, and immune cell-related genetic signatures were selected as important biomarkers.73

A variety of images were analysed using ML including brain MRI (functional MRI, cerebral perfusion, multivoxel proton magnetic resonance spectroscopy) for the detection of NPSLE,44 46 100 102 109–111 121 129 130 funduscopic images for lupus retinopathy23 and clinical images of the skin for acute cutaneous LE, subcutaneous LE and discoid LE.47

Disease activity and treatment response

For predicting renal flares,131–135 152 the best performing models contained both traditional clinical data and novel urine biomarkers, including cytokines, chemokines and/or markers of kidney damage. The best models for predicting renal flares from these studies included XGBoost and ANN (AUCs 0.70–0.94). Quantitative data extracted from renal ultrasound based on features such as texture, shape and wavelength could also detect LN activity.133 Novel biomarkers for LN activity include renal IFI16135 and V-set immunoglobulin domain-containing protein 4.136

For extrarenal flares,59 63 66 101 106 113 137–148 approximately half of the reports used genetic or genetic expression datasets. Novel biomarkers predicting SLE flares or increased disease activity include MX2106 and M1143 gene expression and a nine-protein combination (PHACTR2, GOT2, L-selectin, CMC4, MAP2K1, CMPK2, ECPAS, SRA1 and STAT2).101 The AUC of the best models in these reports ranged from 0.70 to 0.99. One study also demonstrated that Systemic Lupus Erythematosus Disease Activity Index score can be estimated from unstructured clinical notes.137

Treatment response was predicted with a high degree of accuracy in some reports with the outcome of renal flares being the most commonly evaluated.69 131 132 134 149 152 Clinical factors identified using feature importance ML models included C3, C4, age, race, sex, anti-dsDNA, baseline estimated glomerular filtration rate, urine protein-to-creatinine ratio as well as cytokine/protein factors such as CXCL8, pentraxin, adiponectin, MCP1, IL-8, IL-1a, IL-12, IL-6, IFNa2 and IFNy.131 152 The top performing predictive models for treatment response used a simple neural network (AUC 0.9735)134 and an RF model (AUC 0.92).131 Predictors of disease remission (SVM, AUC 0.713)139 and response to B cell therapies (RF, AUC 0.88)77 were examined as well. Lastly, cluster analysis by k-means and consensus cluster to identify different SLE endotypes based on treatment response revealed a wide range of results, for example, the number of reported clusters ranged from 3 to 39.56 74 78 113 142 150 In our own study of 805 patients with SLE from the Systemic Lupus International Collaborating Clinics (SLICC) cohort, k-means clustering on PCA-transformed longitudinal autoantibody profiles over the first 5 years of disease revealed four distinct endotypes that were predictive of long-term disease activity, organ involvement, treatment requirements, and mortality risk.56

Prognostic models

For the prediction of SLE outcomes, ML has been used to predict disease damage (RN, AUC 0.77).40 Prediction of cardiovascular disease (atherosclerosis, cardiovascular events, arrhythmia and heart failure), a major cause of mortality in SLE, has been evaluated.153–159 168 Novel lipoprotein metabolites and deficiency in vitamin D were associated with atherosclerosis.153 157 158 Several candidate hub genes (SPI1, MMP9, C1QA, CX3CR1, MNDA) could predict the risk of atherosclerosis in SLE, and expression of CCR7, RNASE2, RNASE3 and CXCL10 genes for heart failure. The AUCs ranged from 0.81 to 0.98 for the various models.154 A prediction score called SLE-venous thromboembolism (VTE) could predict VTE risk in patients with SLE (LR, AUC 0.808) based on 11 variables: sex, age, body mass index, hyperlipidaemia, hypoalbuminaemia, C reactive protein, anti-ß2-glycoprotein I antibodies, lupus anticoagulant, renal involvement, nervous system involvement and hydroxychloroquine use.21 A prediction model for 3-year allograft survival in kidney transplant recipients with SLE has also been developed (LR and ANN, AUC 0.73) using recipient age, race, maintenance regimen including prednisone, maintenance regimen, predominate renal replacement modality in the pretransplant period, and whether dialysis was required during the first post-transplant week.163

Adverse pregnancy outcomes in patients with SLE were examined using different datasets. SLE activity was predicted in pregnant women (ElasticNet, AUC 0.978) using serum metabolites (glucose, alanine, acetoacetic acid and alpha-ketoisovalerate levels).147 Potential genetic biomarkers identified with ML for predicting adverse pregnancy outcomes during early and mid-pregnancy in patients with SLE are SEZ6, NRAD1 and LPAR4.160 There were also prediction models that used only routinely available clinical variables (eg, levels of alanine transaminase, alkaline phosphatase, lactate dehydrogenase, gamma-glutamyl transferase, erythrocytes, C3, C4, autoantibodies as well as maternal age, smoking status, hydroxychloroquine use and disease duration) (super learning, AUC 0.78; RF, AUC 0.917).161 162

Other outcomes for patients with SLE included reduced risk of breast cancer with the presence of prognostic genetic biomarkers (ie, IRF7, IFI35 and EIF2AK2 gene expression) identified with LASSO.165 Models for the prediction of joint erosions LR model (AUC 0.806),164 herpes infection (RF, AUC 0.942)167 and hypothyroidism (RF, AUC 0.772)166 have also been developed using clinical and serological data. Among the selected features for these models, autoantibodies were found to be important predictors, for example, anti-carbamylated protein and anti-citrullinated protein antibodies for joint erosion164 and anti-dsDNA and anti-SSB/La for hypothyroidism.167 ML models showed promise in predicting the risk of hospitalisation and length of stay from EMR data (best performing models LSTM and XGBoost, AUC 0.88)20 41 42 142 169 and associated healthcare costs from administrative databases.17 142

Future considerations

AI applications have become ubiquitous in medicine, and their impact on SLE care and research is no exception.170 The range of AI applications and utilisation in SLE is expected to grow. Thus far, considerable work in SLE has been focused on developing ML models to predict disease, diagnosis and prognosis. Other applications of AI in SLE including drug discovery, clinical trial design and interpretation,171 diagnostic imaging analysis, personalised medicine and medical devices and technologies are just beginning. Increased availability and access to other types of data in the future will provide even more opportunities for SLE research. ML approaches in SLE may even make use of health data collected from mobile phones, wearable devices, social media and environmental datasets, which are becoming more popular in health research. Integration of more advanced ML methods in future reports will also allow for more efficient analysis of increasingly large and complex datasets. As discussed, there is already evidence of this trend with increased utilisation of deep learning and natural language processing approaches in SLE.

While AI facilitates discoveries that may improve patient outcomes and processes in the healthcare system, researchers should also be aware of the ethical, governance and regulatory considerations, including patient consent, confidentiality, transparency and privacy172 173 (figure 2). In 2019, the European League Against Rheumatism published recommendations that guide researchers on the collection, analysis, interpretation and implementation of big data through AI/ML.174 While these are not discussed in detail in this review, we emphasise that these issues can arise at any step of the ML pipeline. For instance, during data collection, diverse data sources increasingly used by ML approaches (eg, EMR, administrative databases, social media, genetic or other multi-omics datasets, clinical trials and microbiome) are prone to potential sampling biases. These can exacerbate existing disparities in marginalised and underserved populations and violate the bioethical principles of justice. SLE is a disease that disproportionately affects racial and ethnic minorities and is therefore more sensitive to these issues (reviewed in ref 175). The lack of representation by minority populations in clinical research, genetic reports176 and clinical trials177 is a real concern. More work is needed to study how we can address these issues, minimise harm and promote ethical ML models in the future.

Ethics statements

Patient consent for publication

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Contributors All authors were involved in the concept and design, data analysis and interpretation, and editing for intellectual content. KZ, KAB, IYC, MJF and MYC were involved in manuscript drafting.

  • Funding Support for this study also came from the Lupus Foundation of America.

  • Competing interests MJF is a consultant to and has received honoraria and/or travel support from Werfen (Barcelona, Spain; San Diego, California). MJF is also Medical Director of Mitogen Diagnostics. MYC has received consulting fees from Celltrion, Mallinckrodt Pharmaceuticals, Werfen, Organon, AstraZeneca, and MitogenDx.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.