Multiple polygenic risk scores can improve the prediction of systemic lupus erythematosus in Taiwan
•,,,,,.
...
Abstract
Objective To identify new genetic variants associated with SLE in Taiwan and establish polygenic risk score (PRS) models to improve the early diagnostic accuracy of SLE.
Methods The study enrolled 2429 patients with SLE and 48 580 controls from China Medical University Hospital in Taiwan. A genome-wide association study (GWAS) and PRS analyses of SLE and other three SLE markers, namely ANA, anti-double-stranded DNA antibody (dsDNA) and anti-Smith antibody (Sm), were conducted.
Results Genetic variants associated with SLE were identified through GWAS. Some novel genes, which have been previously reported, such as RCC1L and EGLN3, were revealed to be associated with SLE in Taiwan. Multiple PRS models were established, and optimal cut-off points for each PRS were determined using the Youden Index. Combining the PRSs for SLE, ANA, dsDNA and Sm yielded an area under the curve of 0.64 for the optimal cut-off points. An analysis of human leucocyte antigen (HLA) haplotypes in SLE indicated that individuals with HLA-DQA1*01:01 and HLA-DQB1*05:01 were at a higher risk of being classified into the SLE group.
Conclusions The use of PRSs to predict SLE enables the identification of high-risk patients before abnormal laboratory data were obtained or symptoms were manifested. Our findings underscore the potential of using PRSs and GWAS in identifying SLE markers, offering promise for early diagnosis and prediction of SLE.
What is already known on this topic
Studies have explored SLE-associated variants and the association between polygenic risk scores (PRSs) and disease manifestations or severity. However, no study had examined the use of PRSs from multiple phenotypes to predict SLE.
What this study adds
We established multiple PRS models, with SLE, and three additional laboratory markers. This inclusion enhances the discriminatory power of PRSs, improving their predictive capability for SLE.
How this study might affect research, practice or policy
PRSs serve as a clinical tool for detecting SLE markers when a patient’s PRS surpasses the optimal cut-off point. This approach aims to contribute to the early diagnosis and prediction of SLE.
Introduction
SLE is a chronic autoimmune disease that predominantly affects individuals aged 15–40 years, with a female-to-male ratio of approximately 9:1.1 In SLE, the immune system generates autoantibodies targeting cells and tissues, leading to inflammation and tissue damage.2 The severity and clinical manifestations of SLE vary considerably, resulting in a spectrum of symptoms.3 The disease course of SLE is characterised by alternating phases of flares and remission. Flares are periods of increased disease activity and severity, whereas remission is constituted by intervals of milder disease activity. Due to the complexity and variability of its symptoms, early diagnosis of SLE poses a considerable challenge.
The diagnosis of SLE relies on the recently introduced ‘Systemic Lupus Erythematosus Classification Criteria’ collaboratively proposed by the American College of Rheumatology (ACR) and the European Alliance of Associations for Rheumatology (EULAR) in 2019.4 These criteria include various systemic manifestations, such as fever and manifestations related to haematological, neuropsychiatric, mucocutaneous, serosal, musculoskeletal and renal aspects. Additionally, laboratory assessments cover antiphospholipid antibodies, complement proteins and lupus-specific antibodies. To confirm a diagnosis of SLE, a cumulative score of 10 or more, coupled with a positive ANA test, is necessary.
A genome-wide association study (GWAS) is a robust investigative method for identifying genetic variations associated with specific traits or diseases across the entire human genome.5–8 The first three GWASs on SLE were simultaneously published9–11 and scrutinised between 100 000 and 300 000 single nucleotide polymorphisms (SNPs). To date, more than 200 SLE susceptibility loci have been identified, predominantly in European and Asian populations.12 13 Certain genes predisposing individuals to SLE, such as STAT4, IRF5 and BLK, have been replicated in numerous studies across diverse ancestral populations.14 The human leucocyte antigen (HLA) region is undeniably associated with SLE susceptibility across all populations.15 However, determining which genetic variants drive the development of SLE is challenging due to ethnicity-specific linkage disequilibria and allelic heterogeneity, resulting in markedly inconsistent allelic associations among populations. For example, although HLA-DRB1*15:01 and HLA*DQB1*06:02 have been associated with SLE risk in populations of Asian descent,16 European SLE studies have reported an elevated SLE risk associated with HLA-DRB1*03:01, HLA-DRB1*08:01 and HLA-DQA1*01:02.17 Notably, numerous non-HLA SLE susceptibility loci have been identified in East Asian populations, emphasising the importance of conducting GWASs in populations with non-European ancestries with high prevalence and severity.15 18
The polygenic risk score (PRS) is a numerical assessment that quantifies an individual’s genetic predisposition to specific traits or diseases through the analysis of multiple genetic variants across the genome. PRSs may predict the risk of developing specific conditions, making them valuable tools in the development of personalised medicine and risk assessments.19–23 Although few studies have investigated the PRS for SLE and its association with disease manifestations or severity,24–27 these investigations have revealed that individuals with a high PRS for SLE are more prone to severe SLE phenotypes, early onset and higher mortality. Despite the potential applications of PRSs in SLE prediction, these scores rely solely on genetic factors and do not fully consider the influence of environmental and lifestyle factors. Therefore, a key future direction in using PRS is enhancing the accuracy and reliability of SLE prediction. In the present study, we conducted a GWAS and computed the PRS for SLE and other laboratory markers in patients with SLE, with the aim of enhancing the accuracy of SLE prediction.
Materials and methods
Patients and database information
This study used electronic medical records obtained from China Medical University Hospital (CMUH). Individuals in the case group were identified using the International Classification of Diseases, 9th Revision (code: 710.0) or International Classification of Diseases, 10th Revision (codes: M32.0, M32.10, M32.11, M32.12, M32.13, M32.14, M32.15, M32.19, M32.8 and M32.9) codes associated with SLE. Exclusion criteria for SLE cases involved individuals with no history of SLE medication usage and those diagnosed as having autoimmune conditions other than SLE. The control group comprised individuals without a diagnosis of any autoimmune disease. The excluded diagnostic codes and SLE medications are listed in the online supplemental materials. After age and sex matching, the case group comprised 2429 patients and the control group comprised 48 580 patients.
Genotyping
After informed consent was obtained from participants, a 3 mL venous blood sample was collected and preserved in an EDTA tube. Genomic DNA was extracted from 200 µL peripheral blood samples by using the MagCore Genomic DNA Whole Blood Kit (RBC Bioscience, New Taipei City, Taiwan) in accordance with the manufacturer’s instructions. Genetic information from individuals in the Taiwanese population was obtained using the Affymetrix Axiom genotyping platform, specifically the Axiom Taiwan Precision Medicine (TPM)-customised SNP array (Thermo Fisher Scientific, Santa Clara, California, USA). This array encompasses 714 457 SNPs across the entire human genome. PLINK V.1.9 was used for analysis, and samples and SNPs with missing rates were excluded. Variants failed to meet the Hardy-Weinberg equilibrium criteria (p<1e−6 and a minor allele frequency (MAF) of <1e−4) were also excluded. The TPM arrays were phased using SHAPEIT4 and imputation was performed using Beagle V.5.2, known for its efficacy and accuracy compared with other imputation tools. The imputed data were filtered on the basis of criteria such as an r2 alternate allele dosage of <0.3 and a genotype posterior probability of <0.9.28
Genome-wide association study
To identify associated variants, PLINK V.1.9 was employed to obtain summary statistics for patients with SLE and the control group. Familial relationships were determined using PLINK V.2.0 KINSHIP, with second-degree relatives excluded. During GWAS quality control, heterozygous outliers surpassing 5 SDs from the mean and principal component analysis outliers beyond an IQR of 3 were excluded. For case–control-based GWASs, an additive genetic model is employed. Logistic regression was performed to examine associations between traits after adjustment for multiple covariates, such as sex, age and principal component. To mitigate collinearity-induced overestimation, the most significant variant was selected. A variant with a p value of <1×10−5 or, more strictly, p value of <5×10−8 was considered to be significantly associated with SLE. Visualisation tools such as the R package qqman were used to generate Manhattan plots and quantile–quantile (QQ) plots. Additionally, region plots of the variants of interest were prepared using LocusZoom tools.
PRS analysis
To compute the PRS for SLE, we randomly divided the cohort into three datasets: base, training and testing. The base group was used to explore the association between the studied variables and SLE by using PLINK V.1.9. Subsequently, a list of PRSs was constructed using the training group and PRSice2 tools after the exclusion of variants with an MAF of >0.01.29 PRSice2 employs the C (clumping)+T (threshold) method for variant selection. In brief, it first employed the clumping method to identify representative variants based on the minimum p value. Subsequently, variants with p values below specified thresholds are selected to establish logistic regression models for the disease and PRS in training set. Finally, the model with the highest correlation (r2) and variants with p values below the specified threshold were selected. The 1000 Genome phase v.3 of the East Asian population served as a reference for this process, and the PRS was computed through z-score normalisation. Furthermore, our PRS models were validated using the testing group. The accuracy of PRS classification was assessed through receiver operating characteristic (ROC) curves and area under the curve (AUC) values, which were calculated in IBM SPSS Statistics (V.22) software.
HLA imputation
HLA genotype imputation with attribute bagging (HIBAG)30 was used to impute HLA alleles at a four-digit resolution. Default settings of HIBAG were applied during the analysis, and HLA-A, HLA-B, HLA-C, HLA-DPA1, HLA-DPB1, HLA-DQA1, HLA-DQB1 and HLA-DRB1 were imputed in this study. Imputations with a posterior probability of >0.9 were considered reliable.31
Statistical analysis
The continuous and categorical variables in the genotype groups at baseline were analysed using Student’s t-test, the Χ2 test or Fisher’s exact test, as appropriate. A one-way analysis of variance with Tukey’s post hoc test was conducted for between-group comparisons. A two-sided p value of <0.05 indicated statistical significance. All statistical analyses were performed using IBM SPSS Statistics (V.22) and R software (V.4.1.0).
Results
GWAS in patients with SLE
Patients with SLE were selected from the CMUH genetic biobank on the basis of diagnostic codes and SLE medication usage. Established in 2018, the CMUH genetic biobank integrates genotyping data and electronic medical records of patients in CMUH. Individuals with autoimmune diseases were excluded from the non-SLE control group, following which, the control groups were matched with the SLE group by sex and age. A total of 48 580 control individuals were included in the analysis. The clinical characteristics of both the patients with SLE and non-patient controls are presented in table 1. The control and SLE groups were randomly divided into a discovery group and a replication group, accounting for 80% and 20% of the sample, respectively. A GWAS was conducted in the discovery group to identify genetic variants associated with SLE. Remarkably, 4091 SNPs reached genome-wide significance at a threshold of p<1×10−5. Among them, 1719 SNPs reached the more stringent threshold of p<5×10−8, providing robust evidence for their association with SLE. The analysis pipeline is presented in figure 1. The significant SNPs are listed in online supplemental table 1. A total of 13 independent signals (r2<0.2) were identified among the 1719 SNPs associated with SLE, as presented in online supplemental table 2. The results of the SLE GWAS were effectively visualised using a Manhattan plot (figure 2A) and a QQ plot (figure 2B). These plots revealed significant associations between genetic variants and SLE susceptibility. The region plots for chromosomes 2, 6, 7 and 12 are presented in online supplemental figure 1. The GWAS results for all patients with SLE and controls (discovery group plus replication group) remained consistent but more significant than those of the discovery group (online supplemental figure 2). To facilitate visualisation and analysis, the results were uploaded to and processed through LocusZoom,32 a publicly available tool for visualising GWAS results. The top 20 significant genes, identified on the basis of the LocusZoom results, are listed in table 2. Notably, several findings are consistent with those of previous studies, such as GTF2I and STAT4. Additionally, we also identified novel genes associated with SLE in the Taiwanese population that have not been reported before, such as RCC1L, EGLN3 and MRPS23. These newly discovered SLE-associated genes are listed in online supplemental table 3. Additional details can be accessed at https://my.locuszoom.org/gwas/853304/?token=4203647efd3e4e6c8f4b431d85b636fc.
Flow chart of the genome-wide association studies (GWASs) pipeline. A total of 2429 patients with SLE and 48 580 sex-matched and age-matched control patients without autoimmune diseases were enrolled. The study population was further divided into the base, training and testing group for polygenic risk score (PRS) analysis. In the base group, a GWAS was conducted, and the results were used to train the PRS model in the training group. Subsequently, the PRS model was validated in the testing group. Additional GWASs were conducted to derive PRSs of ANA, dsDNA and Sm by using the same procedure. The inclusion cohorts for SLE, ANA, dsDNA and Sm were categorised based on predefined criteria or antibody measurements from the CMUH genetic biobank, which comprised 330 000 patients. Multiple PRSs for SLE, ANA, dsDNA and Sm were employed to predict the occurrence of SLE. CMUH, China Medical University Hospital; dsDNA, anti-double-stranded DNA antibody; Sm, anti-Smith antibody; SNPs, single nucleotide polymorphisms.
Manhattan plot and quantile–quantile plot of SLE. (A) The Manhattan plot depicts the peaks of SNPs surpassing genome-wide significance levels between the SLE and control groups. The blue line denotes the threshold of 1×10−5. The red line denotes the more stringent threshold of 5×10−8. The most significant SNPs in the gene regions of each chromosome are labelled with gene symbols. (B) Quantile–quantile plot demonstrating the consistency between observed p values and their expected values. The red line denotes the expected values. SNPs, single nucleotide polymorphisms.
Table 1
|
Clinical characteristics of SLE study cohort
Table 2
|
Top 20 significant loci associated with SLE
PRS model: patients with SLE
Building on the insights gained from the GWAS, we constructed a PRS model to determine the cumulative effect of multiple genetic variants associated with SLE. PRS models have been demonstrated to be robust tools for estimating an individual’s genetic risk and to play a crucial role in enhancing disease prediction, refining diagnostic approaches, enabling the development of personalised medication regimens and the development of novel therapeutic interventions. The replication group, constituting 20% of the study cohort, was randomly divided into a training group and a testing group. To establish the optimal PRS model, we set a p value threshold of 0.00095 and an r2 value threshold of 0.02492 for the training group, yielding a set of 168 SNPs. A list of the 168 SNPs is provided in online supplemental table 4. A comparison of the PRSs between the SLE and control groups revealed a significant difference in both the training set (normalised mean PRS±SD: 0.399±1.088 in the SLE group vs −0.020±0.991 in the control group; p<0.001; figure 3A) and the testing set (normalised mean PRS±SD: 0.386±1.046 in SLE group vs −0.019±0.994 in control group; p<0.001; figure 3B). To assess the classification accuracy of the PRS model, we employed ROC curves; the model could discriminate between the SLE and control samples with an AUC of 0.609 for the training group (figure 3C) and 0.612 for the testing group (figure 3D). Upon the incorporation of age and sex into the model, the AUC slightly increased to 0.610 in the training group and 0.617 in the testing group. Additionally, we established a novel PRS model encompassing patients with SLE from the CMUH genetic biobank and the Japanese population (meta-analysis from BioBank Japan PheWeb). In this model, a set of 23 SNPs was identified with AUCs of 0.594 and 0.569 in the training group and testing group, respectively (online supplemental figure 3). The set of 23 SNPs in the meta-analysis is provided in online supplemental table 5.
Distribution and statistical outcomes of PRSs in SLE. Distribution and statistical results of PRSs in SLE in the training group (A) and the testing group (B). AUC analyses for evaluating the accuracy of the PRS model and the PRS model combined with age and sex for predicting SLE in in the training set (C) and the testing set (D). **** indicates p<0.001. AUC, area under the curve; PRS, polygenic risk score; ROC, receiver operating characteristic.
Application of multiple PRSs increases the discriminatory power of the PRS model in predicting SLE
To enhance the discriminatory power of the PRS model, laboratory examination data were incorporated. According to the Systemic Lupus Erythematosus Classification Criteria proposed by the ACR, ANA, anti-double-stranded DNA antibodies (dsDNA) and anti-Smith antibodies (Sm) can be used as testing parameters. The participants were categorised into a case group with positive antibody results and a control group with negative antibody results. GWAS and PRS analyses were conducted for each of these test results. On the basis of the p value and r2 value threshold, sets of 42 SNPs (p value of 0.0001005 and the r2 value of 0.00955), 44 SNPs (p value of 5×10−8 and the r2 value of 0.0263) and 83 SNPs (p value of 0.0008 and the r2 value of 0.0137) were identified for calculating the PRSs of ANA-positive, dsDNA-positive and Sm-positive cases, respectively, with significant differences observed (figure 4A–C). These SNPs are listed in online supplemental table 4. The intersection results of the identified SNPs for the PRSs of SLE, ANA, dsDNA and Sm are presented in online supplemental figure 3. Although no SNPs were found to be common to all four features, 13, 10 and 7 SNPs in ANA, dsDNA and Sm, respectively, exhibited linkage disequilibrium (r2>0.2) in the SNP set of the PRS of SLE. The AUCs of SLE predicted using the PRSs of ANA, dsDNA and Sm were 0.606, 0.600 and 0.554, respectively (figure 4D). Combining the PRSs of SLE, ANA, dsDNA and Sm increased the AUC of SLE to 0.651, indicating that the incorporation of multiple PRSs enhances discriminatory power.
Distribution, statistical outcomes and quantile plots of PRSs for SLE, ANA, dsDNA and Sm. Distribution and results of PRS in the training group for ANA (A), Sm (B) and dsDNA (C). (D) ROC and AUC analyses evaluating the discriminatory ability of the PRSs for SLE, ANA, Sm and dsDNA in distinguishing between SLE and control patients. Patients in the replication group were categorised into quintiles on the basis of the PRSs of SLE (E), ANA (F), dsDNA (G) and Sm (H), and ORs for the likelihood of an association between each quintile of PRSs and SLE were generated using logistic regression analysis. The values given are the ORs with 95% CIs. ** indicates p<0.01, *** indicates p<0.005, **** indicates p<0.001. AUC, area under the curve; dsDNA, anti-double stranded DNA antibody; PRS, polygenic risk score; ROC, receiver operating characteristic; Sm, anti-Smith antibody.
To evaluate the risk of SLE, we examined the risk of SLE in each PRS quantile. The patients in the top quintile for the PRSs of SLE, ANA and dsDNA had more than twice the risk of being classified into the SLE group compared with those in the bottom quintile (p<0.001, OR=2.79, 95% CI=2.10 to 3.70 for PRS of SLE; p<0.001, OR=2.55, 95% CI=1.92 to 3.39 for PRS of ANA, OR=2.34, 95% CI=1.78 to 3.07 for PRS of dsDNA; figure 4E–G). The patients in the top quintile for the PRSs of Sm had exhibited approximately twice the risk of being classified into the SLE group compared with those in the bottom quintile (p<0.001, OR=1.52, 95% CI=1.18 to 1.96 figure 4H). The OR for SLE increased with the quantiles of PRSs for SLE, ANA and dsDNA, indicating that an elevated risk was associated with higher PRSs.
To optimise clinical applicability of PRSs, we identified the optimal cut-off point for each PRS by using the Youden Index. Subsequently, logistic regression was employed to evaluate the discriminatory power of the PRS for SLE on the basis of the optimal cut-off points. An ROC curve analysis was conducted to calculate the AUC. The optimal cut-off points for the PRSs of SLE, ANA, dsDNA and Sm are presented in figure 5A. Individuals with a PRS surpassing the optimal cut-off point exhibited a 1.7-fold to 2.1-fold higher risk of developing SLE compared with those with a PRS below the optimal cut-off point (ORs=2.09, 1.88, 1.83 and 1.65 for PRSs of SLE, ANA, dsDNA and Sm, respectively). Multiple PRSs were included to evaluate the model’s ability to discriminate between SLE and control patients, thereby enhancing the overall discriminatory power of the scores. The AUCs of the optimal cut-off points for the PRSs of SLE, ANA, dsDNA and Sm were 0.59, 0.58, 0.57 and 0.55, respectively. When all of these scores were combined, the AUC reached 0.64. Additionally, incorporating the PRS of SLE with any of the other three factors resulted in an AUC of ≥0.6 (figure 5B,C). Online supplemental figure 4 provides a more detailed ROC curve. These findings underscore the robust predictive capacity of PRSs in predicting the onset of SLE.
Classification accuracy of the PRS model in predicting SLE based on the optimal cut-off points. (A) Optimal cut-off points for PRSs of SLE, ANA, Sm and dsDNA were determined using the Youden Index. ORs indicate the association between the optimal cut-off point of PRSs and SLE. (B,C) Results of the ROC and AUC analyses of the discriminatory ability of the optimal cut-off points for the PRSs of SLE, ANA, Sm and dsDNA in distinguishing between SLE and control patients. AUC, area under the curve; dsDNA, anti-double stranded DNA antibody; PRS, polygenic risk score; ROC, receiver operating characteristic; Sm, anti-Smith antibody.
HLA haplotypes of SLE
The HLA gene complex is located at a pivotal position in the human immune system, and it plays a vital role in immune regulation and antigen presentation. Given the dense clustering of SLE-associated variants on chromosome 6, with the most significant genetic locus identified as HLA-DQB1 in the GWAS, we further analysed HLA haplotypes in SLE. Our findings revealed significant differences in 43 HLA haplotypes (p<0.05; online supplemental figure 5), with 22 of them exhibiting particularly notable differences (p<0.001; figure 6A,B). The frequencies of HLA-DQA1*01:01 and HLA-DQB1*05:01 were determined to be 1.54-fold and 1.45-fold higher, respectively, in patients with SLE than those in the control group (OR=1.55 and 95% CI=1.27 to 1.90 for HLA-DQA1*01:01, and OR=1.47 and 95% CI=1.23 to 1.75 for HLA-DQB1*05:01). The predictive model incorporating both HLA haplotype (HLA-DQA1*01:01 and HLA-DQB1*05:01) and PRS achieves an AUC of 0.655, demonstrating a slight improvement in predictive accuracy compared with the model solely based on PRS, which has an AUC of 0.651 (online supplemental figure 7).
Proportions of SLE and control patients with different HLA haplotypes. Comparison of HLA haplotypes between the SLE and control groups with allele frequency (A) and fold change (B). The Χ2 analysis revealed statistically significant differences (p<0.001) in the distributions of HLA haplotypes between the two groups. The white bars denote patients with SLΕ, and the black bars represent control patients. The frequency is expressed as percentages relative to total allele counts for each haplotype, and the fold change of each haplotype in SLE relative to the control group is indicated. HLA, human leucocyte antigen.
Discussion
We conducted a GWAS involving 2429 patients with SLE and 48 580 controls. Multiple PRS models were established to enhance the discriminatory power of PRSs in predicting SLE. To enable the clinical application of PRSs to SLE diagnosis, we employed the Youden Index to determine the optimal cut-off point for each PRS. Combining the PRSs for SLE, ANA, dsDNA and Sm yielded an AUC of 0.64 for the optimal cut-off points. We further analysed HLA haplotypes in SLE. Notably, individuals harbouring HLA-DQA1*01:01 and HLA-DQB1*05:01 exhibited higher risks of being classified into the SLE group.
According to our findings, the clinical utility of the PRS for SLE lies in offering guidance to identify ANA and other SLE markers when a patient’s PRS surpassed the optimal cut-off point for SLE, in conjunction with elevated PRSs for two of the other three marker categories. This, coupled with the presence of potential SLE symptoms, facilitates early detection, thereby preventing delays in diagnosis and treatment resulting from the lack of distinct early symptoms in SLE. Numerous GWASs and investigations of PRSs in SLE have focused on diverse ancestral populations, comorbidities, and the association between PRS and disease manifestations or severity.24 33–37 However, no study thus far has examined the application of PRSs to multiple phenotypes to predict SLE. Liu et al38 underscored the importance of systematically integrating PRSs with routine clinical biomarkers, marking a key step toward establishing PRSs as valuable clinical screening tools. In the current study, we conducted a GWAS; evaluated the PRSs of ANA, dsDNA and Sm; and applied multiple PRSs to enhance the discriminatory power of PRSs in predicting SLE. This approach not only involved the use of laboratory data of these clinical biomarkers but also explored the prospect of leveraging PRSs to predict the disease before its onset. Furthermore, combining PRS and HLA haplotypes could enhance the accuracy of SLE prediction, offering additional indicators for early SLE diagnosis.
We identified 168, 42, 44 and 83 SNPs specific to SLE, ANA, dsDNA and Sm, respectively, which were used for calculating PRSs. An examination of the overlap among these SNPs revealed an absence of common variants across all four features. This observation leads us to speculate that then use of multiple PRSs may contribute to an enhanced discriminatory power of PRSs for SLE. This underscores the considerable value of employing multiple PRSs, which prevents overfitting.39
We employed genetic data and electronic medical records from the CMUH genetic biobank to conduct GWAS and PRS analyses. This data source included comprehensive clinical diagnostic information and medical examination records, mitigating potential uncertainties associated with relying solely on questionnaire-based data. However, we cannot exclude the possibility of patients seeking medical care at other hospitals, potentially introducing the risk of incomplete clinical data. This scenario could lead to the inadvertent inclusion of patients with SLE in the control group. Conversely, some individuals identified as patients with SLE based on established SLE diagnostic criteria may not actually have the condition. To address these concerns, we conducted a small validation study involving 200 individuals with SLE and revealed that they met the ACR/EULAR criteria. In summary, the limitations acknowledged in this study underscore the importance of careful interpretation of our results. In the future, addressing these limitations by using measures such as incorporating data from diverse healthcare institutions and implementing more stringent diagnostic criteria is imperative to enhance the accuracy and reliability of investigations.
Our GWAS results enabled the identification of genetic variants associated with SLE. Many of our findings are consistent with those of previous studies. For example, we identified a significant association between SLE and GTF2I,40HLA-DQB1,41STAT442 and other genes,18 43–48 as detailed in table 2. Additionally, we discovered novel associations between SLE and certain genes in the Taiwanese population that have not been previously reported, such as RCC1L, EGLN3 and MRPS23. RCC1L was reported to be deleted in Williams-Beuren syndrome. EGLN3 was reported to be involved in apoptotic processes and responses to hypoxia. Notably, hypoxia and hypoxaemia have been previously documented in patients with SLE.49MRPS23 was implicated in protein synthesis within the mitochondrion, which is consistent with other studies that reported the involvement of mitochondria in the pathogenesis of SLE.50 These novel associations underscore the need for further investigation into the roles and implications of these SLE risk genes.
In conclusion, PRSs are a precise and effective tool for predicting individuals’ health status and susceptibility to diseases. By evaluating an individual’s genetic composition, PRSs offer valuable insights into potential treatment efficacy and the likelihood of a positive or negative prognosis, even before symptoms manifest. PRSs enable the early detection of conditions before abnormal test results are obtained, thus empowering healthcare practitioners to embrace precision medicine. The prescription of personalised preventive strategies becomes feasible based on each patient’s distinct genetic profile. In this study, we developed multiple PRSs to enhance their discriminatory power in predicting SLE, aiming to contribute to the early diagnosis and prediction of SLE.
Contributors: Y-CC, C-MH and F-JT designed the study. Y-CC, H-FL, C-CL and T-YL performed the analyses. Y-CC and T-YL wrote the manuscript in consultation with C-MH and F-JT. F-JT supervised the project. All authors read and approved the final manuscript. F-JT is reponsible for the overall content as the guarantor.
Funding: Y-CC was supported by China Medical University Hospital (DMR-HHC-112-12).
Disclaimer: The funder had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Competing interests: None declared.
Patient and public involvement: Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Provenance and peer review: Not commissioned; externally peer reviewed.
Supplemental material: This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.
Data availability statement
Data are available in a public, open access repository. Data are available in a public, open access repository. The datasets generated and analysed during the current GWAS are available in the LocusZoom repository.
Ethics statements
Patient consent for publication:
Not applicable.
Ethics approval:
This study involves human participants and was approved by the Research Ethics Committee of CMUH (CMUH110-REC3-005 and CMUH111-REC1-176). Participants gave informed consent to participate in the study before taking part.
Ceccarelli F, Perricone C, Natalucci F, et al. Organ damage in systemic lupus erythematosus patients: A Multifactorial phenomenon. Autoimmun Rev2023; 22. doi:10.1016/j.autrev.2023.103374•Google Scholar
Fortuna G, Brennan MT. Systemic lupus erythematosus: epidemiology, pathophysiology, manifestations, and management. Dent Clin North Am2013; 57:631–55. doi:10.1016/j.cden.2013.06.003•Google Scholar
Aringer M, Costenbader K, Daikh D, et al. European League against rheumatism/American college of rheumatology classification criteria for systemic lupus erythematosus. Arthritis Rheumatol2019; 71:1400–12. doi:10.1002/art.40930•Google Scholar
Bau D-T, Liu T-Y, Tsai C-W, et al. A genome-wide Association study identified novel genetic susceptibility Loci for oral cancer in Taiwan. Int J Mol Sci2023; 24. doi:10.3390/ijms24032789•Google Scholar•PubMed
Chang Y-S, Lin C-Y, Liu T-Y, et al. Polygenic risk score trend and new variants on Chromosome 1 are associated with male gout in genome-wide Association study. Arthritis Res Ther2022; 24. doi:10.1186/s13075-022-02917-4•Google Scholar
Liu T-Y, Liao W-L, Wang T-Y, et al. Genome-wide Association study of hyperthyroidism based on electronic medical record from Taiwan. Front Med (Lausanne)2022; 9. doi:10.3389/fmed.2022.830621•Google Scholar
Liao W-L, Liu T-Y, Cheng C-F, et al. Analysis of HLA variants and graves' disease and its Comorbidities using a high resolution imputation system to examine electronic medical health records. Front Endocrinol (Lausanne)2022; 13. doi:10.3389/fendo.2022.842673•Google Scholar
Harley JB, Alarcón-Riquelme ME, Criswell LA, et al. Genome-wide Association scan in women with systemic lupus erythematosus identifies susceptibility variants in ITGAM, PXK, Kiaa1542 and other Loci. Nat Genet2008; 40:204–10. doi:10.1038/ng.81•Google Scholar•PubMed
Kozyrev SV, Abelson A-K, Wojcik J, et al. Functional variants in the B-cell gene Bank1 are associated with systemic lupus erythematosus. Nat Genet2008; 40:211–6. doi:10.1038/ng.79•Google Scholar
Hom G, Graham RR, Modrek B, et al. Association of systemic lupus erythematosus with C8Orf13-BLK and ITGAM-ITGAX. N Engl J Med2008; 358:900–9. doi:10.1056/NEJMoa0707865•Google Scholar
Huang CM. Association of vitamin D receptor gene Bsmi Polymorphisms in Chinese patients with systemic lupus erythematosus. Lupus2002; 11:31–4. doi:10.1191/0961203302lu143oa•Google Scholar
Oparina N, Martínez-Bueno M, Alarcón-Riquelme ME, et al. An update on the Genetics of systemic lupus erythematosus. Curr Opin Rheumatol2019; 31:659–68. doi:10.1097/BOR.0000000000000654•Google Scholar
Ha E, Bae SC, Kim K, et al. Recent advances in understanding the genetic basis of systemic lupus erythematosus. Semin Immunopathol2022; 44:29–46. doi:10.1007/s00281-021-00900-w•Google Scholar
Sun C, Molineros JE, Looger LL, et al. High-density Genotyping of immune-related Loci identifies new SLE risk variants in individuals with Asian ancestry. Nat Genet2016; 48:323–30. doi:10.1038/ng.3496•Google Scholar
Morris DL, Taylor KE, Fernando MMA, et al. Unraveling multiple MHC gene associations with systemic lupus erythematosus: model choice indicates a role for HLA Alleles and non-HLA genes in Europeans. Am J Hum Genet2012; 91:778–93. doi:10.1016/j.ajhg.2012.08.026•Google Scholar
Wang Y-F, Zhang Y, Lin Z, et al. Identification of 38 novel Loci for systemic lupus erythematosus and genetic heterogeneity between ancestral groups. Nat Commun2021; 12. doi:10.1038/s41467-021-21049-y•Google Scholar
Lin C-Y, Chang Y-S, Liu T-Y, et al. Genetic contributions to female gout and Hyperuricaemia using genome-wide Association study and Polygenic risk score analyses. Rheumatology (Oxford)2023; 62:638–46. doi:10.1093/rheumatology/keac369•Google Scholar
Huang YC, Chang YW, Cheng CW, et al. Causal relationship between adiponectin and diabetic retinopathy: A Mendelian randomization study in an Asian population. Genes2020; 12:17. doi:10.3390/genes12010017•Google Scholar
Cheng C-F, Lin Y-J, Lin M-C, et al. Genetic risk score constructed from common genetic variants is associated with cardiovascular disease risk in type 2 diabetes mellitus. J Gene Med2021; 23. doi:10.1002/jgm.3305•Google Scholar
Chiou J-S, Cheng C-F, Liang W-M, et al. Your height affects your health: genetic determinants and health-related outcomes in Taiwan. BMC Med2022; 20. doi:10.1186/s12916-022-02450-w•Google Scholar
Chen L, Wang Y-F, Liu L, et al. Genome-wide assessment of genetic risk for systemic lupus erythematosus and disease severity. Hum Mol Genet2020; 29:1745–56. doi:10.1093/hmg/ddaa030•Google Scholar
Reid S, Alexsson A, Frodlund M, et al. High genetic risk score is associated with early disease onset, damage accrual and decreased survival in systemic lupus erythematosus. Ann Rheum Dis2020; 79:363–9. doi:10.1136/annrheumdis-2019-216227•Google Scholar
Musone SL, Taylor KE, Nititham J, et al. Sequencing of Tnfaip3 and Association of variants with multiple autoimmune diseases. Genes Immun2011; 12:176–82. doi:10.1038/gene.2010.64•Google Scholar
Webb R, Kelly JA, Somers EC, et al. Early disease onset is predicted by a higher genetic risk for lupus and is associated with a more severe phenotype in lupus patients. Ann Rheum Dis2011; 70:151–6. doi:10.1136/ard.2010.141697•Google Scholar
Liu T-Y, Lin C-F, Wu H-T, et al. Comparison of multiple imputation Algorithms and verification using whole-genome sequencing in the CMUH genetic Biobank. Biomedicine (Taipei)2021; 11:57–65. doi:10.37796/2211-8039.1302•Google Scholar
Zheng X, Shen J, Cox C, et al. HIBAG-HLA genotype imputation with attribute bagging. Pharmacogenomics J2014; 14:192–200. doi:10.1038/tpj.2013.18•Google Scholar
Lu H-F, Liu T-Y, Chou Y-P, et al. Comprehensive characterization of Pharmacogenes in a Taiwanese Han population. Front Genet2022; 13:948616. doi:10.3389/fgene.2022.948616•Google Scholar
Boughton AP, Welch RP, Flickinger M, et al. Locuszoom.Js: interactive and Embeddable visualization of genetic Association study results. Bioinformatics2021; 37:3017–8. doi:10.1093/bioinformatics/btab186•Google Scholar
Khunsriraksakul C, Li Q, Markus H, et al. Multi-ancestry and multi-trait genome-wide Association meta-analyses inform clinical risk prediction for systemic lupus erythematosus. Nat Commun2023; 14. doi:10.1038/s41467-023-36306-5•Google Scholar
Duan L, Shi Y, Feng Y, et al. Systemic lupus erythematosus and thyroid disease: a Mendelian randomization study. Clin Rheumatol2023; 42:2029–35. doi:10.1007/s10067-023-06598-5•Google Scholar
Wang Y-F, Wei W, Tangtanatakul P, et al. Identification of shared and Asian-specific Loci for systemic lupus erythematosus and evidence for roles of type III interferon signaling and lysosomal function in the disease: A multi-ancestral genome-wide Association study. Arthritis Rheumatol2022; 74:840–8. doi:10.1002/art.42021•Google Scholar
Elghzaly AA, Sun C, Looger LL, et al. Genome-wide Association study for systemic lupus erythematosus in an Egyptian population. Front Genet2022; 13. doi:10.3389/fgene.2022.948505•Google Scholar
Song K, Zheng X, Liu X, et al. Genome-wide Association study of SNP- and gene-based approaches to identify susceptibility candidates for lupus nephritis in the Han Chinese population. Front Immunol2022; 13. doi:10.3389/fimmu.2022.908851•Google Scholar
Khunsriraksakul C, Markus H, Olsen NJ, et al. Construction and application of Polygenic risk scores in autoimmune diseases. Front Immunol2022; 13. doi:10.3389/fimmu.2022.889296•Google Scholar
Krapohl E, Patel H, Newhouse S, et al. Multi-Polygenic score approach to trait prediction. Mol Psychiatry2018; 23:1368–74. doi:10.1038/mp.2017.163•Google Scholar
Li Y, Li P, Chen S, et al. Association of Gtf2I and Gtf2Ird1 Polymorphisms with systemic lupus erythematosus in a Chinese Han population. Clin Exp Rheumatol2015; 33:632–8. Google Scholar
Shancui Z, Jinping Z, Guoyuan L, et al. Polymorphism in Stat4 increase the risk of systemic lupus erythematosus: an updated meta-analysis. Int J Rheumatol2022; 2022. Google Scholar
Chen J, Zhang P, Chen H, et al. Whole-genome sequencing identifies rare Missense variants of Wnt16 and ERVW-1 causing the systemic lupus erythematosus. Genomics2022; 114. doi:10.1016/j.ygeno.2022.110332•Google Scholar
Wang JM, Huang AF, Yuan ZC, et al. Association of Irf5 Rs2004640 polymorphism and systemic lupus erythematosus: A meta-analysis. Int J Rheum Dis2019; 22:1598–606. doi:10.1111/1756-185X.13654•Google Scholar
Tangtanatakul P, Thumarat C, Satproedprai N, et al. Meta-analysis of genome-wide Association study identifies Fbn2 as a novel locus associated with systemic lupus erythematosus in Thai population. Arthritis Res Ther2020; 22. doi:10.1186/s13075-020-02276-y•Google Scholar
Fu Y, Lin Q, Zhang ZR, et al. Association of Tnfsf4 Polymorphisms with systemic lupus erythematosus: a meta-analysis. Adv Rheumatol2021; 61:59. doi:10.1186/s42358-021-00215-2•Google Scholar
Wong CK, Wong PTY, Tam LS, et al. Elevated production of B cell Chemokine Cxcl13 is correlated with systemic lupus erythematosus disease activity. J Clin Immunol2010; 30:45–52. doi:10.1007/s10875-009-9325-5•Google Scholar
Yi JH, Chung SA, Chang YH, et al. Practical aspects and efficacy of intraoperative adjustment in concomitant horizontal Strabismus surgery. J Pediatr Ophthalmol Strabismus2011; 48:85–9. doi:10.3928/01913913-20100518-06•Google Scholar