Background SLE patients exhibit considerable clinical and molecular heterogeneity. A robust patient stratification approach can help to characterize individual lupus patients more effectively and aid patient care.
Methods We employed gene set variation analysis (GSVA) of informative gene modules and k-means clustering to identify molecular endotypes of SLE patients based on dysregulation of specific biologic pathways and interrogated them for clinical utility. We utilized machine learning (ML) of these molecular profiles to classify individual lupus patients into singular molecular subsets and used logistic regression with ridge penalization to develop a novel, composite metric estimating the severity of disease based on lupus-related immunologic activity. Shapley Additive Explanation (SHAP) was employed to understand the impact of specific molecular features on the patient sub-setting.
Results Six molecular endotypes were identified in a proof-of-concept cohort from the Illuminate trials (GSE88884) using baseline gene expression profiles. Significant differences in clinical characteristics were associated with different endotypes, with the least perturbed transcriptional profile manifesting the lowest disease activity, and endotypes with more perturbed transcriptional profiles exhibiting more severe disease activity. The more abnormal endotypes were also identified as more likely to have a severe flare over the 52 weeks of the trial and specific endotypes were more likely to be clinical responders to the investigational product (tabalumab). GSVA and k-means clustering of 3166 samples in 17 datasets revealed a total of eight SLE molecular endotypes, each with unique gene enrichment patterns, but not all endotypes were observed in all datasets. ML algorithms were trained and validated on 2183 patients from GSE88884 (ILLUMINATE-1 and ILLUMINATE-2) and three additional datasets (GSE116006, GSE65391, and GSE45291) and demonstrated high degrees of accuracy (98%), precision (94%), sensitivity, and specificity in classifying patients into one of the eight endotypes. A composite molecular score, which comprised aggregate molecular scores of each GSVA gene module, was calculated for each lupus patient. A subset of patients was identified whose molecular scores were not different than those found in normal subjects, whereas other subsets of lupus patients had progressively higher scores indicative of the aggregation of molecular abnormalities. The composite molecular scores were significantly correlated with both anti-DNA titers and SLEDAI. Finally, SHAP analysis of the impact of input GSVA scores indicated that a unique array of features was influential in sorting individual samples into each of the molecular endotypes.
Conclusions Transcriptomic profiling and ML allowed for reproducible separation of lupus patients into molecular endotypes with significant differences in clinical outcomes and responsiveness to therapy.
Gene expression profiles were reduced to a score to assess lupus-related immune activity that correlated with clinical features, the implementation of which may provide a means to categorize lupus patients numerically based on the nature of each individual’s underlying molecular abnormalities.
Lay Summary Lupus patients present with arrays of symptoms that are highly variable, which we describe as heterogeneity. This heterogeneity is also present at a molecular level which means the biological mechanisms underlying disease differ from patient to patient at a given moment in time. We have addressed the clinical challenges presented by this heterogeneity by developing a new way to identify endotypes, or subsets of patients with commonalities in these underlying mechanisms. We used data from thousands of patients in multiple datasets to ensure we are representing the likely universe of lupus patients and used computational algorithms to not only subset the patients but also develop machine learning models that can accurately predict subset (endotype) membership. Finally, the underlying molecular commonalities among these subsets were simplified to the calculation of a single score reflecting an individual patient’s current status of immunologic perturbation. Together, these analyses should provide a new way to categorize lupus patients based on information not currently captured in clinical settings.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ .
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.