Original research

Multiple polygenic risk scores can improve the prediction of systemic lupus erythematosus in Taiwan

Abstract

Objective To identify new genetic variants associated with SLE in Taiwan and establish polygenic risk score (PRS) models to improve the early diagnostic accuracy of SLE.

Methods The study enrolled 2429 patients with SLE and 48 580 controls from China Medical University Hospital in Taiwan. A genome-wide association study (GWAS) and PRS analyses of SLE and other three SLE markers, namely ANA, anti-double-stranded DNA antibody (dsDNA) and anti-Smith antibody (Sm), were conducted.

Results Genetic variants associated with SLE were identified through GWAS. Some novel genes, which have been previously reported, such as RCC1L and EGLN3, were revealed to be associated with SLE in Taiwan. Multiple PRS models were established, and optimal cut-off points for each PRS were determined using the Youden Index. Combining the PRSs for SLE, ANA, dsDNA and Sm yielded an area under the curve of 0.64 for the optimal cut-off points. An analysis of human leucocyte antigen (HLA) haplotypes in SLE indicated that individuals with HLA-DQA1*01:01 and HLA-DQB1*05:01 were at a higher risk of being classified into the SLE group.

Conclusions The use of PRSs to predict SLE enables the identification of high-risk patients before abnormal laboratory data were obtained or symptoms were manifested. Our findings underscore the potential of using PRSs and GWAS in identifying SLE markers, offering promise for early diagnosis and prediction of SLE.

What is already known on this topic

  • Studies have explored SLE-associated variants and the association between polygenic risk scores (PRSs) and disease manifestations or severity. However, no study had examined the use of PRSs from multiple phenotypes to predict SLE.

What this study adds

  • We established multiple PRS models, with SLE, and three additional laboratory markers. This inclusion enhances the discriminatory power of PRSs, improving their predictive capability for SLE.

How this study might affect research, practice or policy

  • PRSs serve as a clinical tool for detecting SLE markers when a patient’s PRS surpasses the optimal cut-off point. This approach aims to contribute to the early diagnosis and prediction of SLE.

Introduction

SLE is a chronic autoimmune disease that predominantly affects individuals aged 15–40 years, with a female-to-male ratio of approximately 9:1.1 In SLE, the immune system generates autoantibodies targeting cells and tissues, leading to inflammation and tissue damage.2 The severity and clinical manifestations of SLE vary considerably, resulting in a spectrum of symptoms.3 The disease course of SLE is characterised by alternating phases of flares and remission. Flares are periods of increased disease activity and severity, whereas remission is constituted by intervals of milder disease activity. Due to the complexity and variability of its symptoms, early diagnosis of SLE poses a considerable challenge.

The diagnosis of SLE relies on the recently introduced ‘Systemic Lupus Erythematosus Classification Criteria’ collaboratively proposed by the American College of Rheumatology (ACR) and the European Alliance of Associations for Rheumatology (EULAR) in 2019.4 These criteria include various systemic manifestations, such as fever and manifestations related to haematological, neuropsychiatric, mucocutaneous, serosal, musculoskeletal and renal aspects. Additionally, laboratory assessments cover antiphospholipid antibodies, complement proteins and lupus-specific antibodies. To confirm a diagnosis of SLE, a cumulative score of 10 or more, coupled with a positive ANA test, is necessary.

A genome-wide association study (GWAS) is a robust investigative method for identifying genetic variations associated with specific traits or diseases across the entire human genome.5–8 The first three GWASs on SLE were simultaneously published9–11 and scrutinised between 100 000 and 300 000 single nucleotide polymorphisms (SNPs). To date, more than 200 SLE susceptibility loci have been identified, predominantly in European and Asian populations.12 13 Certain genes predisposing individuals to SLE, such as STAT4, IRF5 and BLK, have been replicated in numerous studies across diverse ancestral populations.14 The human leucocyte antigen (HLA) region is undeniably associated with SLE susceptibility across all populations.15 However, determining which genetic variants drive the development of SLE is challenging due to ethnicity-specific linkage disequilibria and allelic heterogeneity, resulting in markedly inconsistent allelic associations among populations. For example, although HLA-DRB1*15:01 and HLA*DQB1*06:02 have been associated with SLE risk in populations of Asian descent,16 European SLE studies have reported an elevated SLE risk associated with HLA-DRB1*03:01, HLA-DRB1*08:01 and HLA-DQA1*01:02.17 Notably, numerous non-HLA SLE susceptibility loci have been identified in East Asian populations, emphasising the importance of conducting GWASs in populations with non-European ancestries with high prevalence and severity.15 18

The polygenic risk score (PRS) is a numerical assessment that quantifies an individual’s genetic predisposition to specific traits or diseases through the analysis of multiple genetic variants across the genome. PRSs may predict the risk of developing specific conditions, making them valuable tools in the development of personalised medicine and risk assessments.19–23 Although few studies have investigated the PRS for SLE and its association with disease manifestations or severity,24–27 these investigations have revealed that individuals with a high PRS for SLE are more prone to severe SLE phenotypes, early onset and higher mortality. Despite the potential applications of PRSs in SLE prediction, these scores rely solely on genetic factors and do not fully consider the influence of environmental and lifestyle factors. Therefore, a key future direction in using PRS is enhancing the accuracy and reliability of SLE prediction. In the present study, we conducted a GWAS and computed the PRS for SLE and other laboratory markers in patients with SLE, with the aim of enhancing the accuracy of SLE prediction.

Materials and methods

Patients and database information

This study used electronic medical records obtained from China Medical University Hospital (CMUH). Individuals in the case group were identified using the International Classification of Diseases, 9th Revision (code: 710.0) or International Classification of Diseases, 10th Revision (codes: M32.0, M32.10, M32.11, M32.12, M32.13, M32.14, M32.15, M32.19, M32.8 and M32.9) codes associated with SLE. Exclusion criteria for SLE cases involved individuals with no history of SLE medication usage and those diagnosed as having autoimmune conditions other than SLE. The control group comprised individuals without a diagnosis of any autoimmune disease. The excluded diagnostic codes and SLE medications are listed in the online supplemental materials. After age and sex matching, the case group comprised 2429 patients and the control group comprised 48 580 patients.

Genotyping

After informed consent was obtained from participants, a 3 mL venous blood sample was collected and preserved in an EDTA tube. Genomic DNA was extracted from 200 µL peripheral blood samples by using the MagCore Genomic DNA Whole Blood Kit (RBC Bioscience, New Taipei City, Taiwan) in accordance with the manufacturer’s instructions. Genetic information from individuals in the Taiwanese population was obtained using the Affymetrix Axiom genotyping platform, specifically the Axiom Taiwan Precision Medicine (TPM)-customised SNP array (Thermo Fisher Scientific, Santa Clara, California, USA). This array encompasses 714 457 SNPs across the entire human genome. PLINK V.1.9 was used for analysis, and samples and SNPs with missing rates were excluded. Variants failed to meet the Hardy-Weinberg equilibrium criteria (p<1e−6 and a minor allele frequency (MAF) of <1e−4) were also excluded. The TPM arrays were phased using SHAPEIT4 and imputation was performed using Beagle V.5.2, known for its efficacy and accuracy compared with other imputation tools. The imputed data were filtered on the basis of criteria such as an r2 alternate allele dosage of <0.3 and a genotype posterior probability of <0.9.28

Genome-wide association study

To identify associated variants, PLINK V.1.9 was employed to obtain summary statistics for patients with SLE and the control group. Familial relationships were determined using PLINK V.2.0 KINSHIP, with second-degree relatives excluded. During GWAS quality control, heterozygous outliers surpassing 5 SDs from the mean and principal component analysis outliers beyond an IQR of 3 were excluded. For case–control-based GWASs, an additive genetic model is employed. Logistic regression was performed to examine associations between traits after adjustment for multiple covariates, such as sex, age and principal component. To mitigate collinearity-induced overestimation, the most significant variant was selected. A variant with a p value of <1×10−5 or, more strictly, p value of <5×10−8 was considered to be significantly associated with SLE. Visualisation tools such as the R package qqman were used to generate Manhattan plots and quantile–quantile (QQ) plots. Additionally, region plots of the variants of interest were prepared using LocusZoom tools.

PRS analysis

To compute the PRS for SLE, we randomly divided the cohort into three datasets: base, training and testing. The base group was used to explore the association between the studied variables and SLE by using PLINK V.1.9. Subsequently, a list of PRSs was constructed using the training group and PRSice2 tools after the exclusion of variants with an MAF of >0.01.29 PRSice2 employs the C (clumping)+T (threshold) method for variant selection. In brief, it first employed the clumping method to identify representative variants based on the minimum p value. Subsequently, variants with p values below specified thresholds are selected to establish logistic regression models for the disease and PRS in training set. Finally, the model with the highest correlation (r2) and variants with p values below the specified threshold were selected. The 1000 Genome phase v.3 of the East Asian population served as a reference for this process, and the PRS was computed through z-score normalisation. Furthermore, our PRS models were validated using the testing group. The accuracy of PRS classification was assessed through receiver operating characteristic (ROC) curves and area under the curve (AUC) values, which were calculated in IBM SPSS Statistics (V.22) software.

HLA imputation

HLA genotype imputation with attribute bagging (HIBAG)30 was used to impute HLA alleles at a four-digit resolution. Default settings of HIBAG were applied during the analysis, and HLA-A, HLA-B, HLA-C, HLA-DPA1, HLA-DPB1, HLA-DQA1, HLA-DQB1 and HLA-DRB1 were imputed in this study. Imputations with a posterior probability of >0.9 were considered reliable.31

Statistical analysis

The continuous and categorical variables in the genotype groups at baseline were analysed using Student’s t-test, the Χ2 test or Fisher’s exact test, as appropriate. A one-way analysis of variance with Tukey’s post hoc test was conducted for between-group comparisons. A two-sided p value of <0.05 indicated statistical significance. All statistical analyses were performed using IBM SPSS Statistics (V.22) and R software (V.4.1.0).

Results

GWAS in patients with SLE

Patients with SLE were selected from the CMUH genetic biobank on the basis of diagnostic codes and SLE medication usage. Established in 2018, the CMUH genetic biobank integrates genotyping data and electronic medical records of patients in CMUH. Individuals with autoimmune diseases were excluded from the non-SLE control group, following which, the control groups were matched with the SLE group by sex and age. A total of 48 580 control individuals were included in the analysis. The clinical characteristics of both the patients with SLE and non-patient controls are presented in table 1. The control and SLE groups were randomly divided into a discovery group and a replication group, accounting for 80% and 20% of the sample, respectively. A GWAS was conducted in the discovery group to identify genetic variants associated with SLE. Remarkably, 4091 SNPs reached genome-wide significance at a threshold of p<1×10−5. Among them, 1719 SNPs reached the more stringent threshold of p<5×10−8, providing robust evidence for their association with SLE. The analysis pipeline is presented in figure 1. The significant SNPs are listed in online supplemental table 1. A total of 13 independent signals (r2<0.2) were identified among the 1719 SNPs associated with SLE, as presented in online supplemental table 2. The results of the SLE GWAS were effectively visualised using a Manhattan plot (figure 2A) and a QQ plot (figure 2B). These plots revealed significant associations between genetic variants and SLE susceptibility. The region plots for chromosomes 2, 6, 7 and 12 are presented in online supplemental figure 1. The GWAS results for all patients with SLE and controls (discovery group plus replication group) remained consistent but more significant than those of the discovery group (online supplemental figure 2). To facilitate visualisation and analysis, the results were uploaded to and processed through LocusZoom,32 a publicly available tool for visualising GWAS results. The top 20 significant genes, identified on the basis of the LocusZoom results, are listed in table 2. Notably, several findings are consistent with those of previous studies, such as GTF2I and STAT4. Additionally, we also identified novel genes associated with SLE in the Taiwanese population that have not been reported before, such as RCC1L, EGLN3 and MRPS23. These newly discovered SLE-associated genes are listed in online supplemental table 3. Additional details can be accessed at https://my.locuszoom.org/gwas/853304/?token=4203647efd3e4e6c8f4b431d85b636fc.

Figure 1
Figure 1

Flow chart of the genome-wide association studies (GWASs) pipeline. A total of 2429 patients with SLE and 48 580 sex-matched and age-matched control patients without autoimmune diseases were enrolled. The study population was further divided into the base, training and testing group for polygenic risk score (PRS) analysis. In the base group, a GWAS was conducted, and the results were used to train the PRS model in the training group. Subsequently, the PRS model was validated in the testing group. Additional GWASs were conducted to derive PRSs of ANA, dsDNA and Sm by using the same procedure. The inclusion cohorts for SLE, ANA, dsDNA and Sm were categorised based on predefined criteria or antibody measurements from the CMUH genetic biobank, which comprised 330 000 patients. Multiple PRSs for SLE, ANA, dsDNA and Sm were employed to predict the occurrence of SLE. CMUH, China Medical University Hospital; dsDNA, anti-double-stranded DNA antibody; Sm, anti-Smith antibody; SNPs, single nucleotide polymorphisms.

Figure 2
Figure 2

Manhattan plot and quantile–quantile plot of SLE. (A) The Manhattan plot depicts the peaks of SNPs surpassing genome-wide significance levels between the SLE and control groups. The blue line denotes the threshold of 1×10−5. The red line denotes the more stringent threshold of 5×10−8. The most significant SNPs in the gene regions of each chromosome are labelled with gene symbols. (B) Quantile–quantile plot demonstrating the consistency between observed p values and their expected values. The red line denotes the expected values. SNPs, single nucleotide polymorphisms.

Table 1
|
Clinical characteristics of SLE study cohort
Table 2
|
Top 20 significant loci associated with SLE

PRS model: patients with SLE

Building on the insights gained from the GWAS, we constructed a PRS model to determine the cumulative effect of multiple genetic variants associated with SLE. PRS models have been demonstrated to be robust tools for estimating an individual’s genetic risk and to play a crucial role in enhancing disease prediction, refining diagnostic approaches, enabling the development of personalised medication regimens and the development of novel therapeutic interventions. The replication group, constituting 20% of the study cohort, was randomly divided into a training group and a testing group. To establish the optimal PRS model, we set a p value threshold of 0.00095 and an r2 value threshold of 0.02492 for the training group, yielding a set of 168 SNPs. A list of the 168 SNPs is provided in online supplemental table 4. A comparison of the PRSs between the SLE and control groups revealed a significant difference in both the training set (normalised mean PRS±SD: 0.399±1.088 in the SLE group vs −0.020±0.991 in the control group; p<0.001; figure 3A) and the testing set (normalised mean PRS±SD: 0.386±1.046 in SLE group vs −0.019±0.994 in control group; p<0.001; figure 3B). To assess the classification accuracy of the PRS model, we employed ROC curves; the model could discriminate between the SLE and control samples with an AUC of 0.609 for the training group (figure 3C) and 0.612 for the testing group (figure 3D). Upon the incorporation of age and sex into the model, the AUC slightly increased to 0.610 in the training group and 0.617 in the testing group. Additionally, we established a novel PRS model encompassing patients with SLE from the CMUH genetic biobank and the Japanese population (meta-analysis from BioBank Japan PheWeb). In this model, a set of 23 SNPs was identified with AUCs of 0.594 and 0.569 in the training group and testing group, respectively (online supplemental figure 3). The set of 23 SNPs in the meta-analysis is provided in online supplemental table 5.

Figure 3
Figure 3

Distribution and statistical outcomes of PRSs in SLE. Distribution and statistical results of PRSs in SLE in the training group (A) and the testing group (B). AUC analyses for evaluating the accuracy of the PRS model and the PRS model combined with age and sex for predicting SLE in in the training set (C) and the testing set (D). **** indicates p<0.001. AUC, area under the curve; PRS, polygenic risk score; ROC, receiver operating characteristic.

Application of multiple PRSs increases the discriminatory power of the PRS model in predicting SLE

To enhance the discriminatory power of the PRS model, laboratory examination data were incorporated. According to the Systemic Lupus Erythematosus Classification Criteria proposed by the ACR, ANA, anti-double-stranded DNA antibodies (dsDNA) and anti-Smith antibodies (Sm) can be used as testing parameters. The participants were categorised into a case group with positive antibody results and a control group with negative antibody results. GWAS and PRS analyses were conducted for each of these test results. On the basis of the p value and r2 value threshold, sets of 42 SNPs (p value of 0.0001005 and the r2 value of 0.00955), 44 SNPs (p value of 5×10−8 and the r2 value of 0.0263) and 83 SNPs (p value of 0.0008 and the r2 value of 0.0137) were identified for calculating the PRSs of ANA-positive, dsDNA-positive and Sm-positive cases, respectively, with significant differences observed (figure 4A–C). These SNPs are listed in online supplemental table 4. The intersection results of the identified SNPs for the PRSs of SLE, ANA, dsDNA and Sm are presented in online supplemental figure 3. Although no SNPs were found to be common to all four features, 13, 10 and 7 SNPs in ANA, dsDNA and Sm, respectively, exhibited linkage disequilibrium (r2>0.2) in the SNP set of the PRS of SLE. The AUCs of SLE predicted using the PRSs of ANA, dsDNA and Sm were 0.606, 0.600 and 0.554, respectively (figure 4D). Combining the PRSs of SLE, ANA, dsDNA and Sm increased the AUC of SLE to 0.651, indicating that the incorporation of multiple PRSs enhances discriminatory power.

Figure 4
Figure 4

Distribution, statistical outcomes and quantile plots of PRSs for SLE, ANA, dsDNA and Sm. Distribution and results of PRS in the training group for ANA (A), Sm (B) and dsDNA (C). (D) ROC and AUC analyses evaluating the discriminatory ability of the PRSs for SLE, ANA, Sm and dsDNA in distinguishing between SLE and control patients. Patients in the replication group were categorised into quintiles on the basis of the PRSs of SLE (E), ANA (F), dsDNA (G) and Sm (H), and ORs for the likelihood of an association between each quintile of PRSs and SLE were generated using logistic regression analysis. The values given are the ORs with 95% CIs. ** indicates p<0.01, *** indicates p<0.005, **** indicates p<0.001. AUC, area under the curve; dsDNA, anti-double stranded DNA antibody; PRS, polygenic risk score; ROC, receiver operating characteristic; Sm, anti-Smith antibody.

To evaluate the risk of SLE, we examined the risk of SLE in each PRS quantile. The patients in the top quintile for the PRSs of SLE, ANA and dsDNA had more than twice the risk of being classified into the SLE group compared with those in the bottom quintile (p<0.001, OR=2.79, 95% CI=2.10 to 3.70 for PRS of SLE; p<0.001, OR=2.55, 95% CI=1.92 to 3.39 for PRS of ANA, OR=2.34, 95% CI=1.78 to 3.07 for PRS of dsDNA; figure 4E–G). The patients in the top quintile for the PRSs of Sm had exhibited approximately twice the risk of being classified into the SLE group compared with those in the bottom quintile (p<0.001, OR=1.52, 95% CI=1.18 to 1.96 figure 4H). The OR for SLE increased with the quantiles of PRSs for SLE, ANA and dsDNA, indicating that an elevated risk was associated with higher PRSs.

To optimise clinical applicability of PRSs, we identified the optimal cut-off point for each PRS by using the Youden Index. Subsequently, logistic regression was employed to evaluate the discriminatory power of the PRS for SLE on the basis of the optimal cut-off points. An ROC curve analysis was conducted to calculate the AUC. The optimal cut-off points for the PRSs of SLE, ANA, dsDNA and Sm are presented in figure 5A. Individuals with a PRS surpassing the optimal cut-off point exhibited a 1.7-fold to 2.1-fold higher risk of developing SLE compared with those with a PRS below the optimal cut-off point (ORs=2.09, 1.88, 1.83 and 1.65 for PRSs of SLE, ANA, dsDNA and Sm, respectively). Multiple PRSs were included to evaluate the model’s ability to discriminate between SLE and control patients, thereby enhancing the overall discriminatory power of the scores. The AUCs of the optimal cut-off points for the PRSs of SLE, ANA, dsDNA and Sm were 0.59, 0.58, 0.57 and 0.55, respectively. When all of these scores were combined, the AUC reached 0.64. Additionally, incorporating the PRS of SLE with any of the other three factors resulted in an AUC of ≥0.6 (figure 5B,C). Online supplemental figure 4 provides a more detailed ROC curve. These findings underscore the robust predictive capacity of PRSs in predicting the onset of SLE.

Figure 5
Figure 5

Classification accuracy of the PRS model in predicting SLE based on the optimal cut-off points. (A) Optimal cut-off points for PRSs of SLE, ANA, Sm and dsDNA were determined using the Youden Index. ORs indicate the association between the optimal cut-off point of PRSs and SLE. (B,C) Results of the ROC and AUC analyses of the discriminatory ability of the optimal cut-off points for the PRSs of SLE, ANA, Sm and dsDNA in distinguishing between SLE and control patients. AUC, area under the curve; dsDNA, anti-double stranded DNA antibody; PRS, polygenic risk score; ROC, receiver operating characteristic; Sm, anti-Smith antibody.

HLA haplotypes of SLE

The HLA gene complex is located at a pivotal position in the human immune system, and it plays a vital role in immune regulation and antigen presentation. Given the dense clustering of SLE-associated variants on chromosome 6, with the most significant genetic locus identified as HLA-DQB1 in the GWAS, we further analysed HLA haplotypes in SLE. Our findings revealed significant differences in 43 HLA haplotypes (p<0.05; online supplemental figure 5), with 22 of them exhibiting particularly notable differences (p<0.001; figure 6A,B). The frequencies of HLA-DQA1*01:01 and HLA-DQB1*05:01 were determined to be 1.54-fold and 1.45-fold higher, respectively, in patients with SLE than those in the control group (OR=1.55 and 95% CI=1.27 to 1.90 for HLA-DQA1*01:01, and OR=1.47 and 95% CI=1.23 to 1.75 for HLA-DQB1*05:01). The predictive model incorporating both HLA haplotype (HLA-DQA1*01:01 and HLA-DQB1*05:01) and PRS achieves an AUC of 0.655, demonstrating a slight improvement in predictive accuracy compared with the model solely based on PRS, which has an AUC of 0.651 (online supplemental figure 7).

Figure 6
Figure 6

Proportions of SLE and control patients with different HLA haplotypes. Comparison of HLA haplotypes between the SLE and control groups with allele frequency (A) and fold change (B). The Χ2 analysis revealed statistically significant differences (p<0.001) in the distributions of HLA haplotypes between the two groups. The white bars denote patients with SLΕ, and the black bars represent control patients. The frequency is expressed as percentages relative to total allele counts for each haplotype, and the fold change of each haplotype in SLE relative to the control group is indicated. HLA, human leucocyte antigen.

Discussion

We conducted a GWAS involving 2429 patients with SLE and 48 580 controls. Multiple PRS models were established to enhance the discriminatory power of PRSs in predicting SLE. To enable the clinical application of PRSs to SLE diagnosis, we employed the Youden Index to determine the optimal cut-off point for each PRS. Combining the PRSs for SLE, ANA, dsDNA and Sm yielded an AUC of 0.64 for the optimal cut-off points. We further analysed HLA haplotypes in SLE. Notably, individuals harbouring HLA-DQA1*01:01 and HLA-DQB1*05:01 exhibited higher risks of being classified into the SLE group.

According to our findings, the clinical utility of the PRS for SLE lies in offering guidance to identify ANA and other SLE markers when a patient’s PRS surpassed the optimal cut-off point for SLE, in conjunction with elevated PRSs for two of the other three marker categories. This, coupled with the presence of potential SLE symptoms, facilitates early detection, thereby preventing delays in diagnosis and treatment resulting from the lack of distinct early symptoms in SLE. Numerous GWASs and investigations of PRSs in SLE have focused on diverse ancestral populations, comorbidities, and the association between PRS and disease manifestations or severity.24 33–37 However, no study thus far has examined the application of PRSs to multiple phenotypes to predict SLE. Liu et al38 underscored the importance of systematically integrating PRSs with routine clinical biomarkers, marking a key step toward establishing PRSs as valuable clinical screening tools. In the current study, we conducted a GWAS; evaluated the PRSs of ANA, dsDNA and Sm; and applied multiple PRSs to enhance the discriminatory power of PRSs in predicting SLE. This approach not only involved the use of laboratory data of these clinical biomarkers but also explored the prospect of leveraging PRSs to predict the disease before its onset. Furthermore, combining PRS and HLA haplotypes could enhance the accuracy of SLE prediction, offering additional indicators for early SLE diagnosis.

We identified 168, 42, 44 and 83 SNPs specific to SLE, ANA, dsDNA and Sm, respectively, which were used for calculating PRSs. An examination of the overlap among these SNPs revealed an absence of common variants across all four features. This observation leads us to speculate that then use of multiple PRSs may contribute to an enhanced discriminatory power of PRSs for SLE. This underscores the considerable value of employing multiple PRSs, which prevents overfitting.39

We employed genetic data and electronic medical records from the CMUH genetic biobank to conduct GWAS and PRS analyses. This data source included comprehensive clinical diagnostic information and medical examination records, mitigating potential uncertainties associated with relying solely on questionnaire-based data. However, we cannot exclude the possibility of patients seeking medical care at other hospitals, potentially introducing the risk of incomplete clinical data. This scenario could lead to the inadvertent inclusion of patients with SLE in the control group. Conversely, some individuals identified as patients with SLE based on established SLE diagnostic criteria may not actually have the condition. To address these concerns, we conducted a small validation study involving 200 individuals with SLE and revealed that they met the ACR/EULAR criteria. In summary, the limitations acknowledged in this study underscore the importance of careful interpretation of our results. In the future, addressing these limitations by using measures such as incorporating data from diverse healthcare institutions and implementing more stringent diagnostic criteria is imperative to enhance the accuracy and reliability of investigations.

Our GWAS results enabled the identification of genetic variants associated with SLE. Many of our findings are consistent with those of previous studies. For example, we identified a significant association between SLE and GTF2I,40 HLA-DQB1,41 STAT442 and other genes,18 43–48 as detailed in table 2. Additionally, we discovered novel associations between SLE and certain genes in the Taiwanese population that have not been previously reported, such as RCC1L, EGLN3 and MRPS23. RCC1L was reported to be deleted in Williams-Beuren syndrome. EGLN3 was reported to be involved in apoptotic processes and responses to hypoxia. Notably, hypoxia and hypoxaemia have been previously documented in patients with SLE.49 MRPS23 was implicated in protein synthesis within the mitochondrion, which is consistent with other studies that reported the involvement of mitochondria in the pathogenesis of SLE.50 These novel associations underscore the need for further investigation into the roles and implications of these SLE risk genes.

In conclusion, PRSs are a precise and effective tool for predicting individuals’ health status and susceptibility to diseases. By evaluating an individual’s genetic composition, PRSs offer valuable insights into potential treatment efficacy and the likelihood of a positive or negative prognosis, even before symptoms manifest. PRSs enable the early detection of conditions before abnormal test results are obtained, thus empowering healthcare practitioners to embrace precision medicine. The prescription of personalised preventive strategies becomes feasible based on each patient’s distinct genetic profile. In this study, we developed multiple PRSs to enhance their discriminatory power in predicting SLE, aiming to contribute to the early diagnosis and prediction of SLE.