Background Systemic lupus erythematosus (SLE) is a multifactorial disease with genetic and environmental risk factors that encompass a wide range of disease severity and heterogeneous manifestations. Long-term outcomes for individual patients are difficult to predict and little is known about why an affected individual might develop a particular SLE phenotype. Identifying nuanced patterns in clinical and molecular data of patients could reveal distinct clusters of disease which could in turn lead to more refined and personalized treatment regimens. Previous studies have used phenotype-mapping approaches to identify subtypes of SLE using genome-wide association studies and gene expression data; however, no studies have integrated both genetic and clinical data from electronic health records (EHR) to identify SLE phenotypes using bioinformatic approaches.
Methods We characterized subgroups of patients using sociodemographic and clinical EHR data, and genetic data from previously collected cohorts, for 416 individuals with SLE. Single nucleotide polymorphisms (SNPs) were genotyped on the ImmunoChip. In our analysis, we included 95 variants previously associated with SLE risk. Variables extracted from the EHR included age, sex, race, ethnicity, and disease-associated laboratory results: complement C3 and C4, SSA, SSB, RNP, anti-Smith, and anti-dsDNA. We first determined subtypes by clustering variables using multi-trait finite mixture of regressions (MFMR), a new clustering method designed for large, multi-trait genome-wide datasets that appropriately accounts for the complex structure of our multi-ethnic dataset. We then used regression analyses to examine whether clinical and genetic variables had differential effects across clusters.
Results Approximately 90% of patients were female; 52% were white, 13% African-American, 13% Asian, and 22% other/mixed race. Results demonstrated three distinct clusters (Figure). Cluster 1 (n=165) was characterized as predominately white, non-Hispanic/Latino patients with higher age of onset. Cluster 2 (n=121) had a higher percentage of other/mixed race individuals with mild disease. Cluster 3 (n=130) was categorized by more severe disease, including individuals with a higher percentage of abnormal laboratory values (C3 and C4 levels,+RNP,+anti dsDNA,+SSA,+SSB) and lower age of onset. Eleven SNPs demonstrated significant genotype-cluster interaction with various phenotypes after correction for multiple testing.
Conclusions We identified three distinct subgroups of SLE via unsupervised clustering of sociodemographic and clinical variables derived from EHR and genetic data. Future work will further define these genotype-phenotype clusters and perform validation studies in additional cohorts. Our findings may assist in identifying disease treatments for SLE using a more personalized approach.
Funding Source(s): NIH-NIAMS F32 AR070585 and UCSF PREMIER Core Usage Grant
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.