Big Data Analyses

BD-06 Identification of systemic lupus erythematosus subgroups using electronic health record and genetic databases

Abstract

Background Systemic lupus erythematosus (SLE) is a multifactorial autoimmune disease with genetic and environmental risk factors, as well as heterogeneous manifestations that encompass a wide range of disease severity. Long-term outcomes for individual patients are therefore difficult to predict, as is the scope of organ-system involvement. Little is known about why an affected individual might develop a particular SLE phenotype. A number of studies have used phenotype-mapping approaches to identify subgroups of SLE using genome-wide association studies and gene expression data; however, studies integrating the contribution of both genetic and clinical factors to identify SLE phenotypes in large, comprehensive data sources using bioinformatics analyses remain limited.

Methods We characterized subgroups of SLE patients using genetic, sociodemographic, and clinical variables from previously collected genetic cohorts and electronic health record (EHR) data for 712 individuals with SLE. Genetic data included 95 single nucleotide polymorphisms (SNPs) associated with SLE in the literature genotyped on the Immunochip platform. Variables extracted from the EHR included age, sex, race/ethnicity, age at disease onset, Charlson comorbidity index, and various disease-associated laboratory measures. Preliminary clustering was conducted using multiple correspondence analysis.

Results Approximately 90% of SLE patients represented in genetic and EHR databases were female, and 50% self-identified as Caucasian, 12% Hispanic, 10% African-American, 14% Asian, and 14% Other/Missing. Preliminary results showed distinct clustering by race/ethnicity amongst the 95 SNPs associated with SLE (figure 1). Further clustering and network approach analyses incorporating both genetic and clinical variables, such as laboratory measures, are ongoing and will explore whether distinct SLE subgroups can be identified.

Abstract BD-06 Figure 1
Abstract BD-06 Figure 1

Ninety-five SNPs associated with SLE demonstrate clustering by race/ethnicity

Conclusion This project is a first step towards identifying subgroups of SLE patients through clinical and genetic databases. These findings will contribute to our understanding of SLE and illustrate how combining big data in both genetics and EHR has the potential to further define this heterogeneous disease.

Acknowledgements This work was supported by NIH/NIAMS (grant number F32 AR070585 to M.A.G. and K23 AR063770 to G.S.); AHRQ [grant number R01 HS024412 to J.Y.]; and PREMIER, a NIH/NIAMS P30 Center for the Advancement of Precision Medicine in Rheumatology at UCSF (AR040155). Drs. Yazdany and Schmajuk are also supported by the Russell/Engleman Medical Research Center for Arthritis. The content is solely the responsibility of the authors and does not necessarily represent the official views of the AHRQ or NIH.

Article metrics
Altmetric data not available for this article.
Dimensionsopen-url