Review

Lupus community panel proposals for optimising clinical trials: 2018

Abstract

Formidable impediments stand in the way of treatment development for lupus. These include the unwieldy size of current trials, international competition for scarce patients, complex outcome measures and a poor understanding of these outcomes in the world at large. The heterogeneity of the disease itself coupled to superimposition of variegated background polypharmacy has created enough immunological noise to virtually ensure the failure of lupus treatment trials, leaving an understandable suspicion that at least some of the results in testing failed drugs over the years may not have been negative, but merely uninterpretable. The authors have consulted with many clinical trial investigators, biopharmaceutical developers and stakeholders from government and voluntary sectors. This paper examines the available evidence that supports workable trial designs and proposes approaches to improve the odds of completing interpretable treatment development programs for lupus.

Introduction

During the past several decades, more than 30 promising, strategically targeted biologics have entered early development for the treatment of lupus.1 So far, only belimumab has successfully completed a Phase III programme to obtain regulatory approvals worldwide. However, even this treatment faced significant challenges in executing pivotal Phase III trials, requiring major investment in international trial sites and involving more than 2400 patients.2–4 Since the first approval of belimumab in 2011, seven novel immune modulators have managed to move into Phase III development,5–12 most on the basis of marginal early phase efficacy.6–11 Development of three of these treatments has already been stopped because of failure to meet pivotal endpoints.13–15 Due to the widespread community belief, likely based on the pioneering belimumab experience, that late phase lupus programmes must be very large to succeed, these three recently cancelled programmes for lupus represent more than a billion dollar loss in research and development costs. Meanwhile, many other theoretically promising investigational treatments were put aside earlier in the process.16–23 Given the rarity with which Phase II or III lupus trials have met their primary endpoints, it could be assumed that most of the mechanistically promising treatments tested for lupus in the past 25 years are either ineffective or barely effective. Alternatively, it might be suspected that the testing process itself has flaws. 

The objective of this paper is to address the formidable impediments that stand in the way of treatment development for lupus. These include the unwieldy size of current template designs for Phase II and III trials coupled to worldwide competition for patients who meet the stringent entry criteria at limited numbers of adequately trained trial sites. It has also been difficult to arrive at reliable study endpoints given the complexity and user-unfriendliness of accepted, but often misunderstood outcome measures. The heterogeneity of the disease itself and the superimposition of background polypharmacy to this immunological complexity has created enough noise to ensure the failure of lupus treatment trials, leaving an understandable suspicion that at least some of the results observed may not have been negative, but merely uninterpretable.

The authors have consulted with many clinical trial investigators, biopharmaceutical developers and stakeholders from government and voluntary sectors. To pursue solutions, we have considered strategies for increasing the numbers of patients with lupus who have access to and knowledge about clinical trials. We review data to support approaches to simplifying polypharmacy, tailored for patients with or without organ-threatening disease. We have also evaluated the feasibility of designing trials with more discriminatory endpoints, including a focus on adaptive trials, strategic populations, organ-specific endpoints and outcomes measuring sustainable low disease activity. Any of these innovations might support greater clinical discrimination with effective treatments. As will be discussed below, all of these approaches have been subjected to preliminary testing and seem to increase differences between effective treatments and placebo, paving the way so that smaller trials can succeed. Our recommendations result from a comprehensive analysis of what has and has not worked well in lupus trials over the past several decades, and it is hoped that this analysis will be useful for clinical scientists, clinical trial designers and regulatory agencies.

State of the art care for lupus is insufficient, based on insufficient evidence

Lupus is a complex, multifaceted autoimmune disease24 25 characterised by unpredictable flares of mild, moderate or organ-threatening inflammation and inexorable organ damage which progresses over many decades, resulting in a high rate of disability and early death.26–35 Advances in treatment have been accomplished by the largely empiric use of a variety of immune suppressants developed and optimised for completely different diseases.36 In addition to the toxicity associated with long-term steroid use, there are adverse effects related to the poorly studied combinations of immune suppressant medications that patients with lupus also ubiquitously receive.

Since moderately ill patients with lupus who have chronic, smouldering disease activity over many years are known to develop progressive disability and have a high risk for premature atherosclerosis and early mortality,32 33 long-term management with a safe, targeted biologics would be a defensible position to take if it could be demonstrated that biological therapies are effective. Surveys of patients with lupus were conducted by the Lupus Foundation of America (LFA) in collaboration with UCB in 2010 and Eli Lilly in 2014.37 38 When 531 patients answered the question, ‘Are you satisfied with your current medications?’, only 44% replied that they were either satisfied or very satisfied, and 45% indicated that the effects of medication impair their daily activities and work.37 In the 2014 survey of 827 patients,38 respondents reported taking an average of eight prescription medications. Despite this fact (or perhaps because of it), 87% said that the disease affects their work life, and 55% indicated that they can only work part-time or intermittently because of lupus. Seventy-six per cent reported that chronic fatigue limits their social activities. Therefore, if you ask patients with lupus, the current standard of care is unacceptable.

For more than 50 years, the only approved treatments for lupus in the USA were aspirin, corticosteroids or steroid-inducing agents and antimalarials, which were grandfathered in for approval during the mid-20th century with minimal scientific stringency.

In 2011, belimumab was added to that short list, having survived an arduous Phase III international development programme.2 3 The disconnect, however, is that passing a demanding regulatory process does not ensure patient access. Belimumab is expensive, and this has led to some confusion along well-worn tracks of insurance approval policies. The medical world has come to accept the stipulation that new, expensive treatments should only be prescribed when cheaper standard alternatives, which have longer safety and efficacy track records, are ineffective. In the case of lupus, this makes it necessary for patients to fail unproven treatments that have many known toxicities before the only proven treatment, which has an excellent 10-year safety and tolerability profile, can be tried. In addition, even though belimumab is known to work for cutaneous lupus,39 it was only tested in patients who meet criteria for SLE. Therefore, it is not approved for patients with refractory cutaneous disease who don’t also meet classification criteria for SLE, despite the fact that the intrinsic pathology is the same.40

Addressing the severe shortage of patients for trials

A significant impediment to treatment development for lupus is the lack of qualified trial sites and qualified patients to participate in studies. To expand the population of suitable patients with lupus, an important first step might be to reach a more rational consensus about how to define the disease. Lupus can have myriad manifestations, with unpredictable impacts on various organs of the body. Two different classification criteria are currently in wide use, promulgated by projects from the American College of Rheumatology41 and the Systemic Lupus International Collaborating Clinics.42 These criteria overlap, but each includes certain subsets of patients that the other does not.43 Each includes some patients with very severe disease, and each includes some with minimal disease, the latter being questionable candidates for clinical trials. Both classification criteria exclude many patients with single organ lupus who may have moderate to severe manifestations. These patients might benefit from new treatments, but would be excluded from clinical trials and any subsequent treatment approvals.

Although the science of prognostic markers is not well advanced, there is evidence that certain autoantibodies or immunological pathways may be more relevant to some manifestations such as nephritis.44 Because of this, many consider lupus to be multiple different diseases, distinguished by the organs that become involved. Although an acknowledgement that ‘one immunologic intervention does not fit all’ has merit in optimising advanced treatment development, attempts to distinguish patients and choose treatments based simply on which organs are involved have not been very useful. For example, different pathologically distinct rashes can occur in patients who have nephritis,41 and not all kidney disease is identical.45 46 Thus, it seems most likely that lupus is a complex spectrum disorder where clinical manifestations and pathology may vary along a three-dimensional continuum from patient to patient.47

Why the organs and/or pathologies vary from patient to patient remains obscure, but once focus is drawn to specific immunological disorders, there is some predictability across patients regardless of the organs involved. There is no evidence to suggest that discoid lesions in a patient who also has nephritis are at all different from a discoid rash in a person who only has cutaneous involvement.48 The availability of patients for clinical trials would be significantly increased by including those with lupus spectrum manifestations who may not meet classification criteria for SLE. This is particularly true for patients with cutaneous lupus erythematosus (CLE), since, in the age of the ubiquitous punch biopsy, they represent a large population of patients with objectively confirmable lupus and  measurable clinical findings.

Another way to increase the availability of lupus patients for trials is to develop a better way of finding them. Fundamental to this process will be to develop a greater number of trained, geographically dispersed trial sites.49 Support for the recruitment, training and infrastructure of interested clinical sites would make a significant difference in reaching more patients with the feasibility of trial participation. People of African, Hispanic, Native American or Asian origin have increased risk for lupus and potentially more severe manifestations,50–52 but people of minority descent, poor socioeconomic status and rural residence are generally known to have low representation in clinical trials.53 Insufficient clinical trial participation by minority patients with a serious chronic disease such as lupus may significantly increase already profound healthcare disparities by causing confusion about whether advanced treatments are appropriate for them. Indeed, belimumab, the first biologic that was approved for SLE in 2011, was not initially recommended for patients of African descent because isolated data from Phase III trials suggested that the treatment might not work in that population.54 Nevertheless, African Americans were very poorly represented in Phase III trials, and belimumab appeared to be very effective for an equally underpowered group of African American patients in Phase II.55 Furthermore, patients with SLE of African descent are known to be more likely than other racial groups to have the serological markers that define patients most likely to respond to belimumab.56 Of course, racial groupings (especially the mixed races that characterise culturally Hispanic people) provide a weak substitute for the advanced pharmacogenetics that would be optimal for predicting response to treatments, but there is no current genetic guidance to rationalise treatment selection for lupus, underscoring the importance of ensuring that all lupus groups are adequately represented in clinical trials.

Unless poor and minority patients live in larger urban areas near a university, accessibility to clinical trial participation may not even be feasible. Besides geographical location, barriers to recruitment of these populations for trials may be complex,57 58 but likely include weak relationships between patients and providers, provider attitudes, methods of presenting information about clinical trials and mistrust of research by vulnerable populations.53

Primary care providers may provide an important key to solving issues in minority recruitment. Although less than 1% of the US population participates in clinical trials, a recent poll found that 72% believed that they would participate in a clinical trial if recommended by their own doctor.59 Some data suggests that a focus on local communities and minority-dominant medical institutions may improve clinical trial participation.60 Provision of adequate translation and culturally competent communication, including community and faith-based input into educational materials, and improved attitudes and/or listening skills on the part of trial recruiters have been proposed as meaningful solutions.60–67

In worldwide trials with strong representation by South American and Asian sites, participation by some Hispanic subgroups and Asians has been adequate, but patients of African descent and North American Indians remain poorly represented, even in the largest international trials2 3 9 10 14 (see table 1). Initial steps to address this problem have been undertaken by the LFA with a grant from Health and Human Services, Office of Minority Health. They have developed a project called Improving Minority Participation and Awareness of Clinical Trials for Lupus and have developed a website for education about clinical trials (http://www.lupusfoundation.org/clinicaltrials/). A follow-up pilot project is being undertaken at the Oklahoma Medical Research Foundation, which will capitalise on LFA insights to create a CME-certified programme for primary care providers to increase awareness of lupus research, to identify roadblocks to participation in minority populations and to encourage appropriate referrals to trial centres.

Table 1
|
Enrolment of minority groups in Phase III trials of SLE2 3 9 10 14

We recommend that the entire lupus clinical research community mobilise to develop a comprehensive approach to recruiting, training and supporting the infrastructure for new trial sites in wider geographical areas, coupled with a thoughtful patient outreach initiative using culturally competent information. This may improve the access to clinical trials by a larger population, including underserved populations, and enable empowered, educated decision making about participation.

Addressing the inefficient size of clinical trials for lupus

The problem: disease complexity

Lupus is not only clinically and immunologically heterogenous, the compound outcome measures devised to evaluate patients with disparate clinical manifestations are fraught with pitfalls.47 68–70 Global trials present more than an administrative nightmare—they have also been difficult to design to ensure interpretability of data. Because of the paucity of evidence-based treatments and globally accepted treatment pathways, physicians in practice are empirically prescribing drugs based on habit and clinical lore rather than well-defined, immunologically based phenotypes. International standards for treatment of lupus are not universally applied, and availability of immunosuppressants varies around the world, which together have added fuel to the fire of designing trials to evaluate biological therapies as add-ons to varying background immune modulators.

This increasingly unstructured polypharmacy approach seems archaic in a world that is rapidly moving towards sophisticated and strategic approaches to precision medicine. For example, it is possible that adherence to background therapy being prescribed before baseline may suddenly improve after patients enter a clinical trial, once patients receive increased attention from study coordinators, who studiously record every change and missed dose the patient can report. Since non-adherence to medications is known to be high in SLE,71–73 some have advocated measuring levels of immunosuppressive therapy at the time of screening to ensure that patients truly have active disease on ‘standard of care’ and not because of non-compliance.

These factors might well account for high placebo response rates in trials, once patients who have been non-adherent are suddenly being reminded to take their medications by dedicated trial coordinators. If efficacy rates could approach 80%–100%, a placebo response rate of 35%–40% would not pose an insurmountable problem. Unfortunately, in a heterogenous disease such as lupus, any finely targeted treatment is extremely unlikely to be effective in the majority of patients. If there is a ceiling, as seems likely, with an efficacy for most single biologics of 40%–60%, it is clear that trial conditions that support placebo group response rates of 35%–40% are untenable.

It may not yet be possible to fully unravel the tangled threads of confusing outcome measurements and an alphabet soup of background treatments in heterogenous patients who may or may not have been adherent to medications prior to entering the trial. Nevertheless, some evidence from secondary and exploratory analyses of past trials, now prospectively validated by some recent Phase II studies, suggests simple and effective trial design strategies which juxtapose the degree of illness of patients, the requirement of different patient subsets for background therapy, the permissiveness of medication adjustments during a trial and the stringency of outcome measures (tables 2–4). Adjusting each of these components to adapt to the others will help to decrease data ‘noise’ and make smaller trials more feasible.

Table 2
|
Recommended lupus trial designs
Table 3
|
Performance of outcome measures for lupus studies
Table 4
|
Recommendations for more efficient lupus trials

Evidence that effective lupus treatments can be distinguished from placebo

Trial design

Exploratory analysis of a number of disappointing trials and confirmation using results of prespecified endpoints from a few more recent trials have produced two consistent potential solutions to improve trial designs. Table 2 illustrates that there is now substantial evidence to support improved discrimination of treatments from placebo or lower placebo group response rates in analyses restricted to more severe subsets of patients7–9 12 15 18 21 74–76 or patients receiving less background medications.6 11–13 15 18 21 22 75 77 The definition of ‘less background medication’ has varied, including the exclusion of some or all steroids and immune suppressants,18 22 77 restrictions on baseline rescue treatments11 75 or the encouragement or requirement for steroid tapering during the trial.13 15 21 The recent anifrolumab Phase II trial met its primary endpoint and prospectively demonstrated increased discrimination between treatment and placebo when a steroid-tapering milestone was met. This result was predominantly due to lower placebo response rates when steroid lowering was part of the endpoint as compared with when it was not.12

When the more severe populations have been analysed separately, data from a number of trials confirm that a polypharmacy trial might be more feasible in the sicker subsets of patients.7–9 12 21 75 This concept is best confirmed using data from a treatment that has been proven to be effective in the overall population. Examination of the more ill subset in the belimumab studies demonstrated markedly increased treatment effect compared with the overall population, primarily due to decreased placebo group response rates.76 By mixing the more severe patients in with moderately ill patients, belimumab produced a far more modest treatment effect in its pivotal trials.2 3 Disappointing primary endpoints, using trial designs combining patients of different degrees of illness and allowing background medications sufficient for the most ill among them, confirms the inherent challenge in a complex trial design that allows aggressive background immunosuppressants to be used.6 9 10 14 This may be especially true today, since in the decade since those first successful belimumab trials were designed, ‘standard of care’ has become, if anything, more aggressive.78 Adaptive trial designs might provide robust solutions to some of the complexity encountered in lupus populations.79 80 We suggest that pilot studies using adaptive, serial exclusions of patients based on earlier biomarker or clinical changes should be considered.

Improving pharmacodynamic understanding

The more ill patients may require background treatment, and experience from clinical trials has shown that this works. However, it seems logical that, as the field progresses, we could be more strategic in limiting the choice of background treatments by specifically considering their potential to add to, synergise with, antagonise or simply duplicate the mechanism of action of the treatment under investigation. To date, most trials have been embarked on with little knowledge of how the background and rescue therapy positively or negatively impacts the efficacy of an investigational agent. Given the lack of standard practice in prescribing these agents17 and the heterogeneity of the patients receiving them, the impact of one background therapy, even in the context of being added to only one finely targeted investigational agent, may not be the same in all patients, and some of the patients in trials are taking more than one background treatment. Has science advanced to the point that we can ever perfectly define and characterise patients and their treatments as they enter into lupus trials? Has our technology advanced to the point that it is possible to do better than we are doing now? Perhaps not, but preliminary data suggest that this may be a critical consideration in effective trial designs.12 16 21 77 81–84 Application of run-in periods and adaptive trial designs may help to advance the field in this direction.

Endpoints

The most common multiorgan outcome measures in current usage include versions of the Systemic Lupus Erythematosus Disease Activity Index (SLEDAI)85 86 and the British Isles Lupus Assessment Group 2004 (BILAG 2004)87 Index. These are validated, widely accepted instruments, which, although limited in a number of ways,68–70 88 are well understood and capable of generating interpretable data from standard industry trials when applied in a discerning trial design. Combinations of these instruments have been used to develop categorical endpoints in recent years that attempt to particulate clinically meaningful change in a disease burdened by complexity. These telescoped response definitions are the SLEDAI-based Systemic Lupus Erythematosus Responder Index (SRI) and the BILAG-based Combined Lupus Assessment (BICLA).11 89

The SRI response definition compares disease activity at a specified time point to the baseline condition of the patient, requiring a ≥4-point reduction in the SLEDAI score, no development of a new BILAG A (severe), or more than one new BILAG B (moderate) organ score, and no deterioration from baseline in the physician’s global assessment by ≥0.3 points (or 10% of the scale).89 The BICLA is similar in concept, but the lynchpin is improvement in baseline BILAG disease activity with no worsening in different BILAG organ domains, SLEDAI or PGA.11 Although the SRI and BICLA have both detected treatment effects in trials,51 54 they are not necessarily in agreement with each other when compared directly in studies.11 75 Work to evaluate these two composite endpoints in a clinical setting has illuminated their strengths and weaknesses.70 The BICLA can demonstrate response more reliably than the SRI when there is at least partial response in all organs active at baseline, and the SRI demonstrates response more reliably than the BICLA when there is an expectation of complete response in at least one organ, but not necessarily in every organ. In practice, both of these scenarios occur with some frequency in real-world practice.70

The advantage of using combined indices as opposed to either the SLEDAI or the BILAG alone may be largely in the face validity of tracking concomitant improvement and worsening in different organs. However, when either the SLEDAI or BILAG is used to reflect worsening, this has had minimal to no impact on the combined outcome in trials. It turns out that it is quite rare for patients who improve in one organ to worsen in another. Nevertheless, the SRI and BICLA have occasionally been shown to increase the stringency of the endpoints to a small degree.2 3 9 10 In trials where the effect size is not marginal, these complex endpoints are probably not necessary, and either a four-point drop in SLEDAI or incremental decreases in BILAG severity scoring should suffice.

Organ-specific endpoints

A major advantage of considering organ-specific trials is the availability of focused, objective and interpretable endpoints. Renal, musculoskeletal and cutaneous involvement are common in lupus, making organ-specific trials most feasible for these manifestations. Haematological features could also be feasibly studied in a focused manner, although these might be more difficult to recruit. Renal disease represents a very large unmet need for treatment development, but will only be covered briefly here, since data-driven recommendations are still evolving.90–93 Combination endpoints in nephritis induction trials have usually included improvements in proteinuria, creatinine and haematuria. The optimisation of endpoints, from the point of view of clinical utility and discriminatory capacity, has been challenging.90 The Lupus Nephritis Trial Network L(NTN has published a preliminary look at patients who participated in the Euro-Lupus Nephritis trial evaluating low dose, bimonthly cyclophosphamide.91 Seventy-six patients were followed ≥7 years so that endpoints used in earlier stages of treatment could be evaluated for their impact on longer-term outcomes. Proteinuria <0.8 g/day was the single best short-term predictor of good outcome, adding creatinine in an endpoint did not improve the specificity or sensitivity and adding red cells in a composite endpoint decreased the sensitivity of the measurement.

Nephritis induction trials are, by necessity, studies of patients with immediate organ-threatening disease, requiring aggressive increase of background medication or a head to head comparison of active treatments. However, maintenance patients could be combined in trials with the patients with more severe non-nephritis, assuming that earlier study of interactions between the investigational agent and preferred background treatments has been considered. Renal endpoints using composite disease activity indices such as SLEDAI or BILAG are not recommended.

Joint counts may fluctuate from day to day or even during the course of a day and are not perfectly consistent, given that there is a subjective component to the determination of tenderness (which may vary by the degree of compression) and swelling (depending on the observer). Nevertheless, they are a reliable outcome measure as used in trials of rheumatoid arthritis94 are widely accepted by the community95 and have also been successfully used in lupus studies.12 77 93 96 97 This is true even though arthritis manifestations are generally more subtle in lupus than in rheumatoid arthritis.98 One promising avenue could be the use of MRI or ultrasound in the evaluation of joint inflammation.99 100

CLE presents a particularly compelling therapeutic need for a single organ approach, since there may be as many patients with primary CLE (without other systemic lupus manifestations) as there are with SLE, and patients with CLE are usually excluded from SLE trials. It follows that they will be similarly disenfranchised from new treatment approvals, as has already occurred with belimumab, which is known to be effective for cutaneous manifestations of lupus.39 93

The current standard of care for CLE is inadequate. Approximately 10% of patients with CLE are refractory to all therapies, and about 50% require escalation beyond topical and antimalarial therapy despite the fact that immunosuppressive or biological treatments have never been optimised for their condition.101–104 Thus, a regulatory pathway allowing the evaluation of therapies directed specifically at lupus skin manifestations is greatly needed. Since there is very little detail about skin involvement captured by either the BILAG or SLEDAI, and a review of the few papers available about measuring skin manifestations in SLE suggests that much data about cutaneous lupus is lost when relying on these instruments,105–108 the Cutaneous Lupus Erythematosus Disease Area and Severity Index (CLASI) was developed.109

This instrument was developed with input from American and European dermatologists and patients and has been validated in studies with dermatologists, rheumatologists and trainees, correlating with physician-assessed cutaneous activity and damage, physician’s global skin assessments and pain scores.86–91 109–114 A minimal clinically significant improvement in the CLASI has been determined using several gold standards, and a four-point decrease in CLASI activity correlates with highly improved quality of life (QoL) in patients.91 ,114 115 Clinical responsiveness has been demonstrated in interventional studies of hundreds of patients, including anifrolumab,12 hydroxychloroquine,104 belimumab,116 thalidomide,117 lenalidomide,118 apremilast,119 CC-220120 and IVIG.121 Biomarker studies also demonstrate a correlation between CLASI and immunological manifestations of inflammation.122 123

One strength of the CLASI is that it combines signs that are important to patients, as independently validated by the correlation of QoL with CLASI activity score, a finding not seen with the CLASI damage score.124 125 Discussion with patients has determined that they are concerned with erythema and scale as an indication of disease activity, but have often come to terms with the scarring and dyspigmentation that do not signify ongoing clinical activity. The CLASI breaks down the element of activity, such as erythema and scale, by body surface area, so it is possible to determine the type of activity and areas improved in a trial, although studies have shown that the total activity score correlates with other measures of improvement.87 88 91 While interpretation of erythema may vary according to skin colour and expertise, that would be the case for any assessment of skin activity. Studies in other skin diseases have found that skin type has not altered the perception of colour in trained raters.126 Severe erythema can rarely result in haemorrhagic crusting, and this is captured as an indication of severe activity. Evanescent erythema is a problem with any type of skin assessment. Oral lesions can be hidden, so oral involvement is assessed only if the patient is aware, so as to minimise inter-rater reliability related to different extent of the exam and lighting used in oral exams.

We conclude that the CLASI is now well-validated and functioning robustly in clinical trials. This instrument will provide a consistent, interpretable and meaningful endpoint for organ-specific trials of cutaneous lupus, suitable for a primary or secondary endpoint in trials where patients with active skin lesions may have a diagnosis of CLE and/or SLE. Furthermore, it will be valuable in trials of general lupus manifestations where there are expected to be substantial numbers of patients with skin involvement. If CLASI improvement is strong enough when nested in a study of wider SLE manifestations, the treatment should be made available for patients with CLE whether or not they meet classification criteria for SLE. Patients with CLE have the same skin disease as patients with SLE and may have the same or more unmet needs for treatment as many patients with SLE.

In 2003, the Food and Drug Administration (FDA) reacted to the dearth of new drugs licensed for SLE with a guidance document, suggesting that organ-specific therapies may be submitted to the FDA for approval. We agree that this is an excellent idea, and it is not clear why this has not already happened. Not only could organ-specific outcomes be useful for the study of patients who have lupus who do not meet criteria for SLE, but they might also provide a less confusing approach to measuring the most common features seen in SLE, such as skin and joints, as compared with multiorgan, multifaceted outcome measures. Since it is difficult to assign CLE subtype early on in disease and because 20% of patients with CLE have more than one subtype of CLE, it is important to have studies that include the spectrum of CLE and not insist on designating one specific CLE subtype for studies. The CLASI is able to measure all subtypes, with the exception of the rare patient with lupus panniculitis .

Stringent endpoints

Whether applying compound, multiorgan or organ-specific endpoints or whether the population under study has greater illness requiring more background treatments or less illness allowing greater reduction in polypharmacy, more stringent endpoints may increase the differentiation between effective treatments and placebo. Several recent approaches to measuring endpoints have demonstrated a significant impact on increasing treatment effect size, including dichotomous endpoints requiring greater degrees of improvement,2 3 7 9 12 15 74 attainment of low disease activity12 and demonstration of sustained improvement.12 These analyses share the downside of being more stringent and harder to achieve than classic outcome measures for lupus. Therefore, the rates of response in both active and placebo groups are lowered compared with results using the same treatment and less stringent endpoints.12 However, recent trials have confirmed that each of these approaches can support the discrimination of effective treatments in smaller trials.

Severe flares have also demonstrated discriminatory capacity as endpoints,2 3 despite ongoing issues with the way in which they have been defined in currently used indices.69 There are few enough severe flares in trials that this endpoint would only be useful in larger trials. However, the lack of clinical significance of many flares that meet endpoint definitions for mild–moderate flare has reduced the interpretability of mild–moderate flare data.18 23 Some of this issue has been addressed by a revised SELENA SLEDAI Flare Index.127 This instrument separates moderate flare from mild flare, providing a compromise that does not require restriction of analysis to only the less common severe flares, while optimising the probability that those flares counted are likely to be clinically significant. However, this instrument follows the understandable but problematic pattern of prior flare definitions by integrating a change of treatment into the flare definition. This means that if certain treatment changes are made, a flare is defined even if disease activity has not increased, and it is unfortunate that treatment changes are common in clinical practice in the absence of disease flare.69 Furthermore, many moderate and even some severe flares are not immediately treated for various reasons, including infection, patient resistance, drug toxicities and lack of access to care. Thus, although the concept of intention to treat helps to anchor the clinical significance of a flare, it is clear that too close of a linkage between treatment and a flare definition is impractical and can produce misleading data.

A Phase II trial of Abatacept published in 2010 did not meet any of its primary or secondary endpoints, but a real-world evaluation was collected from a simple assessment at each visit, in which clinicians were asked, ‘Did this patient flare?’ By answering this question, more flares were detected in the placebo group than the treatment group.23 Moreover, there were flares captured more frequently in this analysis than those defined by index ‘severe flares’ but less frequently than those meeting the index ‘mild–moderate’ flare definitions, suggesting the possibility that the clinicians were capturing clinically significant flares in a more discriminatory manner than they did when using glossary-driven indices, without much guidance.

In 2011, the LFA published the results of an international consensus group which had been convened with the goal of reaching a simple but precise, real-world, clinically meaningful definition of flare. After an extensive Delphi process and deconstruction of many group discussions, this definition now reads, “a flare is a measurable increase in disease activity in one or more organ systems involving new or worse clinical signs and symptoms and/or laboratory measurements. It must be considered clinically significant by the assessor and usually there would be at least consideration of a change or an increase in treatment.”.128 This approach relies on and benefits from practical clinical judgement, is devoid of the pitfalls of complex algorithms for severity definitions and provides a suitable compromise between the two issues that have limited past flare analyses: the infrequency of severe flares and the need to exclude flares that are not clinically meaningful. Including a consideration of treatment enhances the likely clinical significance of defined flares without unnecessarily restricting flares to those that are treated or eliminating those that are not. We recommend this definition as a potential outcome measurement for clinical trials.

The choice of endpoints and other aspects of trial design might depend on the phase of treatment development. For example, aggressive withdrawal of background medications may be more practical in early phase trials. Since this approach requires a population limited to those with significant, but non-organ threatening disease, smaller-sized trials are more feasible. The advantage of these trials is that pharmacodynamic impact of agents can be studied in the absence of confusing polypharmacy, and this can inform later phase trials to limit the confounding effects of combining synergistic or mutually antagonistic agents. Optimal trial designs and endpoints at different phases are described (table 4).

Patient-reported endpoints

Patient-reported endpoints (PROs) have had some success in following disease activity in SLE,129–136 and the importance of including the patient’s perspective in outcome measures for trials is widely appreciated. The LupusQoL, LupusPRO, L-QoL and SLEQOL were developed specifically for SLE and involved patients with lupus in concept elicitation and development of items.129 The FACIT fatigue and HAQ have also been used in a number of lupus trials.2 3 12 18 21 23 75 Unfortunately, the track record of PROs in clinical studies and clinical trials has not always been interpretable, sometimes producing opposite results to those of the clinician-evaluated endpoints.12 23 Given the failure of many trials to meet their primary and/or secondary clinical endpoints, however, a fair comparison of PRO performance in discriminating treatment from placebo is simply not possible yet.

Two well-validated PROs that were developed specifically for SLE (Lupus PRO and LupusQol) are candidates for further use in trials, but neither has had widespread use in currently published trials. The LupusQol was used in a Phase II trial of rituximab18 and in a large, Phase III programme for epratuzumab.14 Improvements were seen comparable to SLEDAI and BILAG scores, but there was no treatment difference in any groups. The SF-36134 and FACIT fatigue scale4 136 were able to distinguish differences comparable to clinician-reported endpoints in Phase III belimumab trials. Skin-specific QoL measures such as the Skindex have correlated well with CLASI endpoints for those with CLE.115

Clinician endpoints are designed to evaluate current disease activity, and PROs, which are purposefully designed for the patient perspective, may not make a strong distinction between current, reversible disease and elements that are unrelated to the efficacy of immune-modulating interventions, such as treatment side effects, life circumstances and irreversible damage from chronic disease. Both the LupusQol and Lupus PRO include reminders to report features that are due to lupus, but there are no distinctions between active, reversible disease and indirect or chronic consequences of illness. Nevertheless, some descriptors lend themselves to active disease analysis quite well, so further testing of these instruments in trials may help better than generic PROs to channel the attention of the patient towards reversible lupus disease.

Summary

In our opinion, progress in treatment development for lupus will require reliable, smaller and shorter trials. This can be achieved by one or more of the following approaches: elimination, simplification or tapering of background treatments. Use of more discriminatory endpoints based on more stringent definitions of improvement, such as ilow disease activity, sustained improvement or single organ improvement. If single organ assessments provide definitive results, we propose that approval should be extended to all patients with lupus with that manifestation, not just patients who meet classification criteria for SLE. If Phase II trials demonstrate convincing, statistically significant results, these should suffice for pivotal demonstrations of efficacy. Larger trials could then be conducted to amass sufficient safety data, where precise application of complex disease activity instruments at widespread international trial sites would be, although still desirable, less critical. Meanwhile, identification and training of new trial sites and the initiation of education and access to trials of patients around the world should be a priority. Further work is also needed on integration of patient perspectives into outcome measures and outreach to minority populations who are under-represented in lupus trials. Finally, further study should be given to a better practical understanding of the immunological impact of targeted immune modulators, not only in terms of their pharmacodynamic effects on distinct subsets of patients, but also their interactions with standard of care medications that might be required for use as background therapies in trials of substantially ill patients.