Original research

Scoring systemic lupus erythematosus (SLE) disease activity with simple, rapid outcome measures

Abstract

Objective Existing methods for grading lupus flares or improvement require definition-based thresholds as increments of change. Visual analogue scales (VAS) allow rapid, continuous scaling of disease severity. We analysed the performance of the SELENA SLEDAI Physician’s Global Assessment (SSPGA) and the Lupus Foundation of America-Rapid Evaluation of Activity in Lupus (LFA-REAL) as measures of improvement or worsening in SLE.

Methods We evaluated the agreement between prospectively collected measures of lupus disease activity [SLE Disease Activity Index (SLEDAI), British Isles Lupus Assessment Group Index 2004 (BILAG 2004), Cutaneous Lupus Area and Severity Index (CLASI), SSPGA and LFA-REAL] and response [(SLE Responder Index (SRI)-4 and BILAG-Based Combined Lupus Assessment (BICLA)] in a clinical trial.

Results Fifty patients (47 females, mean age 45 (±11.6) years) were assessed at 528 consecutive visits (average 10.6 (±4.1) visits/patient). Changes in disease activity compared with baseline were examined in 478 visit pairs. SSPGA and LFA-REAL correlated with each other (r=0.936), and with SLEDAI and BILAG (SSPGA: r=0.742 (SLEDAI), r=0.776 (BILAG); LFA-REAL: r=0.778 (SLEDAI), r=0.813 (BILAG); all p<0.0001). Changes (∆) in SSPGA and LFA-REAL compared with screening correlated with each other (r=0.857) and with changes in SLEDAI and BILAG (∆SSPGA: r=0.678 (∆SLEDAI), r=0.624 (∆BILAG); ∆LFA-REAL: r=0.686 (∆SLEDAI) and 0.700 (∆BILAG); all p<0.0001). Changes in SSPGA and LFA-REAL strongly correlated with SRI-4 and BICLA by receiver operating characteristic analysis (p<0.0001 for all). Additionally, LFA-REAL correlated to individual BILAG organ scores (musculoskeletal: r=0.842, mucocutaneous: r=0.826 (p<0.0001 for both)).

Conclusion SSPGA and LFA-REAL are reliable surrogates of common SLE trial end points and could be used as continuous or dichotomous response measures. Additionally, LFA-REAL can provide individualised scoring at the symptom or organ level.

Trial registration number NCT02270957.

Introduction

SLE is a heterogeneous, multisystem autoimmune disease characterised by waxing and waning disease activity over time.1 Accurately measuring lupus disease activity and the changes in disease activity has proven to be a difficult task. This is highlighted by failure of over 20 late phase therapeutic trials to produce interpretable results,2 though recent positive studies of belimumab, anifrolumab, ustekinumab and baricitinib have allowed guarded optimism. Multiple clinical assessment tools attempt to distill the myriad of symptoms with different levels of severity and risk to vital organs. It follows that the consistent and successful application of these measures as end points in clinical trials remains elusive.3

The most widely used disease activity measures in international, multicentre trials are the SLE Disease Activity Index (SLEDAI)4 5 and the British Isles Lupus Assessment Group Index (BILAG 2004).6 Beyond their individual strengths and weaknesses (reviewed elsewhere3), both instruments were developed through a consensus approach to derive thresholds for changes in disease activity.7 The SLEDAI is less sensitive to change, sets a high bar for improvement, is scored based on the ‘typical’ severity of a symptom, regardless of current severity in an individual patient and cannot record worsening or partial improvement. The BILAG accommodates gradations in severity, but predefined thresholds for change impede its accuracy. Moreover, the BILAG compresses different descriptors within each organ, scoring does not increase when ≥2 descriptors within an organ are equally severe. To address the shortcomings of each disease activity instrument, composite indices have been developed, such as the SLE Responder Index (SRI)8 and BILAG-Based Combined Lupus Assessment (BICLA),9 both used in large registrational studies. These end points are dominated and limited by the instruments that gauge improvement: the SLEDAI and BILAG, respectively.3

Visual analogue scales (VAS) allow continuous scaling of disease severity, directly grounded in clinical observation at the time of scoring. Even the best glossary-based instrument cannot describe appropriate scoring increments for every clinical observation; VAS have the potential to bypass that problem. Furthermore, VAS provide an opportunity for studies to determine clinically significant changes, rather than relying on predetermined glossary-based definitions as landmarks for disease severity. Unfortunately, past studies of VAS in SLE have given inconsistent results, likely due to the potential variations in how clinicians interpret these scales.10 11

The SELENA SLEDAI Physician’s Global Assessment (SSPGA) VAS has addressed the problem by adding severity anchors at mild, moderate and severe disease and a simple but specific protocol for scoring designed to improve interrater and intrarater consistency.4 5 The SSPGA was originally developed as a 3 inch scale,4 5 but was later adapted to a 100 mm scale in many clinical trials, where it was found to provide data consistent with directional changes in BILAG and SLEDAI.12–15 The Lupus Foundation of America-Rapid Evaluation of Activity in Lupus (LFA-REAL) modifies and extends the SSPGA structure by providing subscales for individual symptoms, allowing the separate scoring of symptoms within the same organ (eg, rash and vasculitis), as well as scoring of ‘other’ less common symptoms of SLE, such as gastrointestinal and ophthalmic involvement (online supplementary figure 1).7 The structure of LFA-REAL reflects its conception as an integration of elements of the SSPGA VAS and the organ-based scoring system of the BILAG to allow the clinician’s evaluation of patient progress at the level of individual symptoms, organs or total disease activity. The instrument was designed to remain versatile and broad, yet simple enough for scoring by both clinicians and clinical trialists. While the LFA-REAL includes both a clinician’s version and a similarly minded patient-reported outcome, the current paper only discusses the clinician instrument.

As the clinician’s version of the LFA-REAL evolved, scaling increments were more clearly defined and additional innovations differentiated it from the SSPGA and previous VAS scores.7 In particular, the LFA-REAL scoring instructions include: 1) disease activity is scored without regard for the medications being used (ie, mild arthritis in a patient on 20 mg of prednisone is not rated as higher disease activity than the same mild arthritis in a patient on no medication); 2) at consecutive visits the previous VAS must be examined prior to scoring the current one, and consider progress to the current visit; 3) the landmarks of 1, 2 and 3 correspond to each level of disease severity: 0—signifies complete remission, 3—reflects the worst disease possible in a patient with SLE, not the worst seen in the current patient. Methods to gauge disease grade cutoffs between and around the intervening landmarks have been inconsistent in clinical trials using SSPGA, largely due to the lack of consistent guidance provided in instructions. The LFA-REAL specifically assigns equal lengths for each scale for mild, moderate and severe disease. Thus, mild disease is scored between 0 and 1, moderate between 1 and 2 and severe between 2 and 3.

A previous study evaluated the LFA-REAL in relation to SLEDAI, BILAG and SSPGA in routine clinical care of patients with SLE, demonstrating significant correlations to those instruments (r=0.58–0.88, p<0.001).16 In the current study, we compare the performance of SSPGA and LFA-REAL with other SLE trial outcome measures using blinded patient data from a clinical trial in SLE.

Materials and methods

Clarification of Abatacept Effects in SLE with Integrated Biologic and Clinical Approaches (The ABC Study) is a recently completed investigator-initiated clinical trial conducted at the Oklahoma Medical Research Foundation, with funding from Bristol-Myers Squibb. The data evaluated here were analysed prior to unblinding and compare outcome measures regardless of treatment assignment. Patients provided informed consent prior to the initiation of any study procedures. Disease activity (hybrid SELENA-SLEDAI, BILAG 2004, SSPGA and LFA-REAL, Cutaneous Lupus Area and Severity Index (CLASI) and 28 tender and swollen joint counts) was prospectively scored at consecutive visits.17 18 The following visits were included in analysis: screening visit, up to 12 monthly visits and up to 2 follow-up visits at 2 and 4 months after completion of the study. Baseline visits were excluded, since these often occurred too close to screening to allow any meaningful change in disease activity. The hybrid SELENA-SLEDAI used in this study (referred here as the SLEDAI) is identical to the SELENA-SLEDAI4 5 except for the proteinuria definition from SLEDAI 2K.19 20 All instruments were scored by trained rheumatology clinicians with experience in scoring SLE disease activity measures according to the standard protocols. Data quality was monitored by two investigators.

We followed the same VAS scoring protocol for both the SSPGA and LFA-REAL,4–6 as described under ‘Introduction’, in order to support accurate comparisons. For the current project, we used the 100 mm SSPGA. The SRI-4 and BICLA were computed as previously described,8 9 but without consideration of whether or not there were changes in medications. However, it is acknowledged that if any of these measures (including a VAS-based response) are used as outcome measures in trials, it is usual to require no increases in medications as a component of treatment response definitions. The purpose of the current exercise was not to determine treatment response, but to focus on performance characteristics of simpler measurements of disease activity and eliminate as many extraneous variables as possible.

SSPGA and LFA-REAL scores across visits were compared by Wilcoxon matched-pairs signed rank test. Changes in SSPGA and LFA-REAL from the initial (screening) visit were similarly compared. Correlations between SSPGA and LFA-REAL as well as changes in SSPGA and LFA-REAL and SLEDAI and BILAG were examined by non-parametric Spearman’s rank test. Receiver operating characteristic (ROC) curve analysis was applied to compare changes in SSPGA and LFA-REAL with SRI-4 and BICLA. SigmaPlot V.12 (Systat Software) was used for all statistical analyses.

Results

Fifty patients (47 female, mean age (±SD) 44.6 (±11.6) years) were evaluated at 528 visits, with an average of 10.6 (±4.1) consecutive visits per patient. Twenty-six subjects were Caucasian, 11 African-American, 7 Native American and 6 Hispanic. At the initial visit, average SLEDAI was 6.8 (±2.8), BILAG 10.5 (±3.9), SSPGA 53 mm (±8 mm) and LFA-REAL 71 mm (±22 mm). All patients entered the study with at least moderately active arthritis (at least one BILAG A (severe) or B (moderate) organ score with ≥3 swollen and ≥3 tender joints on 28 joint count). Twenty-nine individuals also had active mucocutaneous features. However, none had active cardiopulmonary, renal or significant haematological involvement.

By design, the LFA-REAL has a wider range of values compared with SSPGA. This was consistent at all visits (Wilcoxon matched-pairs signed rank test, p<0.0001), representing the expanded scoring scale that could capture disease activity with better detection of subtle differences (table 1A, online supplementary figure 2). For example, if a person had mild arthritis and moderate rash the expanded scaling would allow a higher numeric score than a person with only moderate rash, a distinction not captured on the SSPGA. The greater scores for LFA-REAL were also evident when patients with SSPGA<33 mm or SSPGA≥33 mm were separately examined. Similarly, changes in LFA-REAL compared with the initial visit had a wider range of values compared with changes in SSPGA (Wilcoxon matched-pairs signed rank test, p<0.0001). This might provide an increased discriminatory potential capturing improvement or worsening (table 1B, online supplementary figure 3). The distinction remained evident whether patients were improving at the time of the visit (p<0.0001) or not (p=0.0002) by SRI-4 (Wilcoxon matched-pairs signed rank test).

Despite a significant difference in range, total SSPGA and LFA-REAL scores strongly correlated to each other (r=0.932) by cross-sectional analysis at all visits (figure 1A). Changes in SSPGA and LFA-REAL compared with screening were also strongly correlated (r=0.857) (figure 1B).

Figure 1
Figure 1

(A, B) Correlations between Lupus Foundation of America-Rapid Evaluation of Activity in Lupus (LFA-REAL) (mm) and SELENA SLEDAI Physician’s Global Assessment (SSPGA) (mm).

Both the SSPGA and LFA-REAL scores correlated well to SLEDAI and BILAG 2004 at all visits (table 2, p<0.0001 for all), with LFA-REAL performing marginally better than SSPGA. Absolute changes in SSPGA and LFA-REAL compared with the initial visit were examined in 478 visit pairs, with both VAS-based scales correlating well to changes in SLEDAI and BILAG (table 2, all p<0.0001).

Table 2
|
(A) Spearman’s rank correlation coefficients between SSPGA and LFA-REAL and SLEDAI and BILAG. (B) Spearman’s rank correlation coefficients between changes (Δ) in SSPGA and LFA-REAL and changes (∆) SLEDAI and BILAG. (C) Area under the curve (AUC) of SSPGA and LFA-REAL in discrimination of SRI or BICLA response
Table 1
|
Disease activity at all visits (A) and change (Δ) in disease activity compared with initial visit (B)

Changes in SSPGA and LFA-REAL were very strongly correlated to the dichotomous SRI-4 and BICLA end points by ROC analysis (p<0.0001 for all) (table 2, online supplementary figures 4A-D). A range of thresholds for improvement by SSPGA and LFA-REAL were examined to identify an optimal trade-off of sensitivity and specificity in equilibrating these VAS to an SRI-4 response (online supplementary figure 5AB). To obtain 75% sensitivity and 84%–86% specificity for detecting an SRI-4 response requires at least 27.2 mm improvement in SSPGA and 36.3 mm improvement in LFA-REAL. A similar preliminary exploration was done for the BICLA response (online supplementary figure 6AB).

Individual LFA-REAL components correlated with BILAG 2004 organ scores, with r=0.842 for musculoskeletal scores and r=0.826 for mucocutaneous scores (p<0.0001 for both) (figure 2A and B, table 3). High correlations were also observed between musculoskeletal LFA-REAL and SLEDAI arthritis, tender joint count and swollen joint count. Correlations were also high between mucocutaneous LFA-REAL, CLASI activity score and the SLEDAI combined scoring of rash, mucosal ulcers and alopecia (table 3).

Figure 2
Figure 2

(A, B) Correlations between Lupus Foundation of America-Rapid Evaluation of Activity in Lupus (LFA-REAL) domains (mm) and British Isles Lupus Assessment Group Index (BILAG) organ scores.

Table 3
|
Spearman’s rank correlation coefficients of LFA-REAL musculoskeletal and LFA-REAL mucocutaneous with other disease activity scoring instruments

Discussion

We evaluated the performance of SSPGA and LFA-REAL (both assessed using the LFA-REAL modified SSPGA scoring rules) in a population of patients with SLE with predominantly musculoskeletal and mucocutaneous disease, participating in an ongoing clinical trial. SSPGA and LFA-REAL were reliable surrogates of commonly used lupus clinical trial end points. Compared with SSPGA, LFA-REAL had a broader scoring range for both absolute scores and score changes. This might provide an opportunity for increased discrimination between gradations of active disease and changes in disease activity, especially in individuals with multiple organ involvement, where some organs may improve/worsen more than others. This hypothesis remains to be tested against the gold standard of clinically significant change, based on real time raters’ clinical assessments. The current paper only examined equivalency between SSPGA and LFA-REAL, not superiority of one instrument over the other. The potential advantages of the LFA-REAL over other instruments are hypothesised and remain to be proven/validated.

Setting increments for clinically important change in disease activity is an elusive brass ring for SLE outcome measures.21 Cutoffs in lupus disease activity instruments have been previously determined by consensus.22–24 We examined the range of SSPGA and LFA-REAL changes reflecting accepted standards for clinically significant improvement used in clinical trials (SRI-4 and BICLA). When using our modified scoring rules, the LFA-REAL and the original SSPGA performed well against the SRI-4 and BICLA. The optimum balance of sensitivity and specificity for SRI-4 and BICLA could be narrowed to some degree by using these VAS-based instruments. Therefore, as a proof of concept, data obtained through simple VAS scoring can be calibrated to accepted outcome measures. A prospective validation study with ROC analysis could further determine the changes in LFA-REAL that reflect the gold standard of physicians’ opinion, while assessing patients in real-time in the clinic. Baseline disease activity or significance of organ involvement may also be important to optimal definitions of response, and should be tested to determine if these dimensions should be integrated in a VAS response algorithm.

The SELENA SLEDAI PGA was purposefully designed and has evolved over time with specific, widely accepted instructions. These instructions have been modified for clarity and precision and applied to the SSPGA and LFA-REAL in this study. This simple protocol is likely the prerequisite for consistent VAS scoring.10 Although the consistency with which any scoring rules are followed around the world may be in some doubt, one reason for this could be the enormous burden on SLE clinical trial investigators to pass comprehensive disease assessment tests for complicated instruments. The more material covered, the less likely it will all be retained. The LFA-REAL modified SSPGA scoring rules are quite simple to learn and apply. We submit that these simple, but versatile VAS instruments work well in clinic.12–15

A prior disease assessment tool that was published over two decades ago as part of a more extensive SLE evaluation instrument has some similarity to the LFA-REAL in that it uses VAS to score multiple symptoms or organs.25 The LFA-REAL differs from this other instrument in several ways. First, the scaling has been clarified to ensure equal space for mild, moderate and severe disease. Second, every active symptom/sign in any individual patient is scored on a separate VAS. Third, the total disease activity score is the sum of each active component, which is identical to the sum of each organ. Additionally, the LFA-REAL is different from the BILAG, which scores organs based only on the degree of activity in the most active component and not by incremental summing of the physician-weighted observations within that organ. Furthermore, LFA REAL scores reflect the real-world grading of current disease activity in each active feature without the impact of underlying disease severity, such as aggressiveness of medications, weighting by organ, usual severity in a population or other prognostic factors. Finally, the clinician’s LFA-REAL is being developed with a complementary patient-reported outcome measure for tandem assessments of common features that clinicians and patients score concurrently using the same instructions.26

Correlations of both SSPGA and LFA-REAL with total SLEDAI and BILAG were high, and a similar relationship was demonstrated between musculoskeletal and mucocutaneous LFA-REAL subscales and their corresponding BILAG organ scores as well as SLEDAI arthritis and mucocutaneous descriptors in this study. Thorough evaluations of each patient by trained lupus investigators, with concurrent scoring of SLEDAI and BILAG could explain the superior performance of VAS scoring. Similar results were however observed when those instruments were scored by clinicians with no prior training in disease activity instruments, with good intraclass correlations of LFA-REAL scoring between clinicians and trained lupus investigators.16 We did not examine associations of LFA-REAL subscales with BILAG organ scores or SLEDAI descriptors in the small number of patients with systemic or cardiorespiratory involvement in our study. It is possible that those correlations would however have been less consistent, because of the small number of cases, and because of less consistent scoring of systemic features under the LFA-REAL ‘other’ subscale. Associations of SSPGA and LFA-REAL with serologic surrogates of inflammations (eg, complement levels, serum cytokines and gene expression signatures) remain to be examined and may provide important insights on how these instruments compare with each other and with other clinical outcome measures.

Conclusion

When scored using the LFA-REAL modification of the SSPGA scoring rules, both the SSPGA and LFA-REAL are reliable surrogates for SLE trial end points. Both instruments are easy to score and understand, and this preliminary evidence suggests that either could be used as continuous or dichotomous trial end points. The LFA-REAL incorporates individual scoring at the symptom or organ level and expands the range of analyses that can be obtained from a rapid, efficient and easy to understand outcome measure. Defining the psychometric properties of LFA-REAL and SSPGA will help determine their role in clinical trials, population studies and patient care.