Article Text

Original research
Machine learning-based identification of novel hub genes associated with oxidative stress in lupus nephritis: implications for diagnosis and therapeutic targets
  1. Huiqiong Zeng1,
  2. Yu Zhuang2,
  3. Xiaodong Yan3,
  4. Xiaoyan He4,
  5. Qianwen Qiu1,
  6. Wei Liu5,6 and
  7. Ye Zhang1
  1. 1Traditional Chinese Medicine Department of Immunology, Women & Children Health Institute Futian Shenzhen, Shenzhen, China
  2. 2Department of Rheumatology and Immunology, Huizhou Central People’s Hospital, Huizhou, China
  3. 3School of Basic Medical Sciences, Yunnan University of Chinese Medicine, Kunming, China
  4. 4Department of Fu Xin Community Health Service Center, The Eighth Affiliated Hospital of Sun Yat-Sen University, Shenzhen, China
  5. 5Department of Rheumatology, First Teaching Hospital of Tianjin University of Traditional Chinese Medicine, Tianjin, China
  6. 6National Clinical Research Center for Chinese Medicine Acupuncture and Moxibustion, Tianjin, China
  1. Correspondence to Dr Ye Zhang; ftfyzhangye{at}


Background Lupus nephritis (LN) is a complication of SLE characterised by immune dysfunction and oxidative stress (OS). Limited options exist for LN. We aimed to identify LN-related OS, highlighting the need for non-invasive diagnostic and therapeutic approaches.

Methods LN-differentially expressed genes (DEGs) were extracted from Gene Expression Omnibus datasets (GSE32591, GSE112943 and GSE104948) and Molecular Signatures Database for OS-associated DEGs (OSEGs). Functional enrichment analysis was performed for OSEGs related to LN. Weighted gene co-expression network analysis identified hub genes related to OS-LN. These hub OSEGs were refined as biomarker candidates via least absolute shrinkage and selection operator. The predictive value was validated using receiver operating characteristic (ROC) curves and nomogram for LN prognosis. We evaluated LN immune cell infiltration using single-sample gene set enrichment analysis and CIBERSORT. Additionally, gene set enrichment analysis explored the functional enrichment of hub OSEGs in LN.

Results The study identified four hub genes, namely STAT1, PRODH, TXN2 and SETX, associated with OS related to LN. These genes were validated for their diagnostic potential, and their involvement in LN pathogenesis was elucidated through ROC and nomogram. Additionally, alterations in immune cell composition in LN correlated with hub OSEG expression were observed. Immunohistochemical analysis reveals that the hub gene is most correlated with activated B cells and CD8 T cells. Finally, we uncovered that the enriched pathways of OSEGs were mainly involved in the PI3K-Akt pathway and the Janus kinase-signal transducer and activator of transcription pathway.

Conclusion These findings contribute to advancing our understanding of the complex interplay between OS, immune dysregulation and molecular pathways in LN, laying a foundation for the identification of potential diagnostic biomarkers and therapeutic targets.

  • lupus nephritis
  • biological products
  • risk factors
  • pharmacogenetics

Data availability statement

Data are available upon reasonable request. The LN cohort’s transcriptional dataset (GSE32591, GSE112943 and GSE104948) was downloaded from the Gene Expression Omnibus (GEO) database (

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


  • Lupus nephritis (LN) is a complication of SLE associated with immune dysfunction and oxidative stress (OS).

  • Limited treatment options are available for LN.


  • This study identified 41 OS-associated differentially expressed genes (OS-DEGs) in patients with LN.

  • The study revealed a significant association between the magenta module obtained from weighted gene co-expression network analysis and OS-LN.

  • STAT1, SETX, PRODH and TXN2 were identified as hub biomarkers for OS-LN using machine learning techniques, suggesting their potential as diagnostic and therapeutic targets.

  • The study demonstrated that B cells, T cells, dendritic cells and activated natural killer cells play a role in the pathogenesis of OS-LN.

  • Functional enrichment analysis showed that the OS-DEGs are primarily involved in the PI3K-Akt pathway and Janus kinase-signal transducer and activator of transcription pathway.


  • This study underscores the importance of OS and immune dysregulation in the mechanisms of LN.

  • The identified OS-associated genes have potential clinical value for the diagnosis and development of therapeutic interventions for LN.

  • Insights into the involvement of specific immune cell types provide opportunities for targeted therapies.

  • Understanding the enriched pathways could lead to the development of novel treatments for LN, improving clinical outcomes for patients with this condition.


SLE is a complex autoimmune disease (AID) characterised by a variety of clinical phenotypes, including lupus nephritis (LN) as a significant complication.1 LN unfolds through intricate interactions of immune dysregulation and oxidative stress (OS), for which therapeutic interventions targeting OS in LN remain relatively scarce.2 Despite the invasive and inconvenient nature of renal biopsy hindering prompt treatment of LN, timely identification and intervention are crucial for patient well-being.3 Therefore, non-invasive methods are urgently needed to elucidate novel hub genes associated with OS in LN, thereby deepening our understanding of its pathogenesis and paving the way for potential diagnostic and therapeutic innovations.4

In the field of artificial intelligence, machine learning (ML) has emerged as a powerful tool widely applied in bioinformatics analysis.5 ML algorithms enable the analysis of complex datasets, aiding in different AID subtyping, biomarker discovery, pathway analysis and drug exploration.6 7

OS, stemming from an imbalance between reactive oxygen species (ROS) generation and degradation, drives excessive activation of immune cells and promotes LN progression. Antioxidant therapies hold promise in modulating inflammation and OS pathways, potentially ameliorating LN complications.8

The immune landscape in LN involves various subsets of immune cells, contributing multifunctionally to renal tissue damage and fibrosis. Studies have indicated activation of immune cells in renal tissues of patients with LN, sustaining inflammatory responses and promoting LN pathogenesis.9–11

Weighted gene co-expression network analysis (WGCNA) is a valuable tool for identifying highly correlated gene clusters, complemented by the least absolute shrinkage and selection operator (LASSO) regression method to enhance prediction accuracy and interpretability.12 13

OS plays a pivotal role in the pathogenesis of LN, driving immune cell activation and tissue damage, thus exacerbating disease severity. ML and WGCNA provide tools to identify biomarkers associated with LN pathogenesis. Through extensive data analysis, these methods aid in understanding how OS influences immune cell infiltration, thereby facilitating LN development and progression to make out the potential biomarkers in renal biopsy samples of patients with LN and construct OS-based prediction models. Hence, the integration of ML and WGCNA techniques with the study of the relationship between OS and immune cell infiltration holds promise for a more comprehensive understanding of LN pathogenesis.

To the best of our knowledge, this study represents a novel contribution to the field. Our investigation primarily focuses on tissue-infiltrating cells, an aspect not thoroughly explored in previous literature. These discoveries hold promise for advancing our comprehension of LN pathogenesis, thereby laying a groundwork for future translational studies aimed at enhancing diagnostic and therapeutic approaches.


Data collection and processing

To obtain the gene expression profile data of LN, we included renal biopsy sample datasets: microarray datasets GSE112943 (14 cases of LN, 7 healthy controls) and GSE32591 (64 cases of LN, 29 healthy controls), downloaded from the Gene Expression Omnibus database (, and analysed to determine the LN-differentially expressed genes (DEGs) between LN renal tissue and normal renal tissue.14 15 Additionally, GSE104948 (65 cases of LN, 21 healthy controls) was taken as a validation set.16 Gene symbols were identified through the normalisation transformation of all probes based on platform annotation information. Batch effects across all probes were corrected using the ‘sva’ package in R V.

OS-associated differentially expressed genes

Identification and functional enrichment analysis

The ‘GOBP RESPONSE TO OXIDATIVE STRESS’ pathway in the Molecular Signatures Database (MSigDB, was used to obtain OS-related genes.18 A total of 437 OS-associated DEGs (OSEGs) were obtained for subsequent analysis after removing duplicate genes. DEGs analysis was conducted using the ‘limma’ package in R software, a commonly used statistical method for identifying DEGs at the intersection of two training set samples: LN-DEGs and OSEGs.19 We applied strict filtering criteria, namely ‘adjusted P-value<0.05 and log2(FC)>0.585 or log2(FC)<−0.585’, to ensure the selection of genes with significantly different expression levels. To visually represent the patterns of DEGs, we employed the ‘pheatmap’ package in R to generate a heatmap. This visualisation method aids in observing patterns and trends in gene expression across samples, facilitating a better understanding of changes in gene expression within the OS pathway in LN.

To further elucidate the potential function of the OSEGs related to LN, Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses of potential OSEGs were carried out by the R package ‘ClusterProfiler program’.20

WGCNA analysis

We used WGCNA to identify module genes and construct a co-expression network for the profile of the two training datasets.21 To obtain a stable model, we filtered outliers and selected an appropriate soft threshold beta, which was validated. Next, we constructed a topological overlap matrix for hierarchical clustering and dynamic tree cutting of genes to screen networks. We then calculated the gene significance (GS) and module membership (MM) to determine the clinical significance of genes and features of LN and to investigate the associations between the modules. Finally, we visualised the GS network in the module.22 The WGCNA parameters were selected as follows: the selected power for soft thresholding was 20. The minModuleSize was applied to specify the minimum module size in the context of WGCNA, with a parameter set to 50. The parameter value of Mergecutheight was set to 20 000, specifying the height at which modules were merged. The deepSplit was employed to determine the depth of module splitting, and our selection was set to 2. Spearman was used to specify the method for calculating gene correlations.

ML of identification and validation of renal tissue

Blood biomarkers in LN

The hub genes associated with OS in LN were identified by selecting candidate biomarkers with the highest intermodule connectivity obtained from WGCNA analysis. Venn 2.1.0 online platform ( was used to obtain the overlapping biomarkers.23

The final six hub genes associated with LN were further screened using the LASSO logistic regression algorithm with the glmnet package in R software.13 Barplots and receiver operating characteristic (ROC) curves were plotted to assess the expression levels and diagnostic value of the obtained hub genes in distinguishing between LN and healthy controls, respectively.24 The parameter settings for the LASSO model were specified as follows: alpha=1; maximum iterations (max_iter): 1000; no normalisation of the data was performed; the coefficient update method was specified as cyclic.

The predictive value of the hub genes was evaluated by calculating the area under the curve (AUC) and 95% CIs using the qROC package in R software. The OSEGs with AUC greater than 0.65 were considered optimal diagnostic biomarkers for predicting LN. Additionally, the expression levels and diagnostic values of the hub genes were validated using a separate external dataset (GSE104948). We retrieved gene expression data from a publicly available database using the data file GSE104948. Low-expressed genes were filtered out and the data were normalised. We also presented the composite ROC curve for the hub OSEGs, indicating that the AUC value reflects the system’s sensitivity and specificity in distinguishing between LN and the healthy control group. We constructed a risk model using a logistic regression model and generated a nomogram to visualise the optimisation model’s results.25 Following this, model calibration was performed, and a calibration curve was generated to assess the predictive performance of the model. Lastly, the clinical utility of the model was evaluated through decision curve analysis (DCA).

Immune cell infiltration analysis

Using the single-sample gene set enrichment analysis (ssGSEA) algorithm, we quantified and visualised the 28 types of immune cell infiltration to evaluate the infiltration levels of each immune cell type in LN.26 The CIBERSORT method infers the relative abundance of immune cell subtypes in samples by establishing a linear regression model using reference gene expression data.27 Gene expression data from LN samples were inputted into the model to calculate the relative abundance of various immune cell subtypes, followed by statistical inference to assess their distribution within the samples, providing crucial information on the composition of immune cells. Violin plots were created to assess the differential expression levels of the 28 immune infiltrating cells and correlate them. The relationship between hub genes and infiltrating immune cells was determined using Spearman correlation. The results were displayed using the ‘ggplot2’ package. Statistical significance with a p value <0.05 was considered.

GSEA of immunological signature

The GSEA software (V.4.0) was conducted with the ‘enrichplot’ packages in R to perform four hub OSEGs related to LN set enrichment analysis using the Hallmarks.gmt datasets.28 GSEA was performed to identify the differentially changed pathways between immunological gene sets and KEGG gene sets. The differentially expressed transcripts were compared with the corresponding databases, and the enrichment of relevant gene sets was analysed statistically. Enrichment scores (normalised enriched score, NES) ≥1.0, nominal p value <0.001 and false discovery rate (FDR) q value <0.05 were considered statistically significant.


Identification of OSEGs in LN

A total of 437 OS-related genes were obtained from the MSigDB. Using R software, 41 primarily screened OSEGs were identified through the intersection of 11 390 DEGs from the merged LN datasets with 437 OS genes. Their expression was depicted in the renal biopsy samples of patients with LN from GSE32591 and GSE112943, as shown in figure 1A. 24 OSEGs related to LN were found to be upregulated, with the top six being signal transducer and activator of transcription 1 (STAT1), platelet-derived growth factor receptor alpha (PDGFRA), frataxin (FXN), glutathione peroxidase 7 (GPX7), ADP-ribosylation factor-like 6 interacting protein 5 (ARL6IP5) and cluster of differentiation 38 (CD38). 17 were downregulated, including NR4A3, NR4A2, JUN, FOSL1, proline dehydrogenase (PRODH) and TXNRD2. GO and pathway analyses were performed to elucidate the biological functions of these genes in terms of molecular functions, biological processes (BP) and cellular components. The majority of the OSEGs were found to be involved in cellular response to OS, antioxidant activity and DNA binding, as shown in figure 1B. Additionally, KEGG pathway enrichment analysis revealed that the PI3K-Akt pathway, platelet activation, Janus kinase-signal transducer and activator of transcription (JAK-STAT) signalling pathway and nucleotide-binding oligomerisation domain (NOD)-like receptor pathway were involved in LN, as shown in figure 1C. In conclusion, our enrichment analyses identified the OS-related BP and aberrant signal pathways involved in the progression of LN.

Figure 1

OS-associated DEGs (OSEGs) in the renal biopsy samples of patients with lupus nephritis (LN) from GSE32591 and GSE112943. (A) Heatmap of OSEGs. (B) Gene Ontology enrichment analysis displaying biological processes (BP), cellular components (CC) and molecular functions (MF) with gene ratios and adjusted p values in LN. (C) Bubble chart of Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis plotting the significance and gene ratio of key signalling pathways with OSEGs in LN. CON, control; DEGs, differentially expressed genes; NOD, nucleotide-binding oligomerisation domain; OS, oxidative stress; TNF, tumour necrosis factor.

Identification of LN-related gene modules

We performed WGCNA analysis on the training datasets to further explore the genes associated with the LN module. The WGCNA package in R software was used to construct a scale-free network and filter it via the pick soft threshold function.

We found that the optimal soft threshold for the beta was 20, where the scale-free topology fitting indices R2=0.80 (figure 2A) and the mean connectivity=11.4 (figure 2B). After merging similar modules in the model, we obtained five different modules (figure 2C) containing genes with high co-expression profiles that were associated with LN.

Figure 2

Oxidative stress-lupus nephritis (OS-LN) module identification by weighted gene co-expression network analysis (WGCNA). (A, B) Determination of network robustness and connectivity by assessing scale independence (R2) and mean connectivity at incremental soft threshold powers. (C) The gene dendrogram displays module colours that reflect a hierarchical clustering analysis. Clustering height is indicated on the y-axis, with gene co-expression similarity decreasing from top to bottom. OS-LN modules identified by dynamic tree cutting are represented by colour bands at the bottom, indicating gene groupings by expression patterns. (D) Heatmap displaying module–trait relationships with correlation coefficients and p values, contrasting healthy control (CON) and LN conditions across various modules (MEblack, MEmagenta, MEgreen, MEpurple, MEgrey). (E) This scatter plot shows OS-LN module membership in the magenta module and gene significance for LN. A weak positive correlation (cor=0.1), significant at p=0.0028, suggests a small but potentially relevant association between gene significance in LN and magenta module membership.

The genes in the upper quartile of MM and GS values in each module were considered hub genes. The magenta module displayed the highest positive correlation with LN (figure 2D), suggesting that the genes in this module may play a key role in the progression of LN. We observed that the magenta module, consisting of 894 genes, had the largest positive correlation coefficient with LN (r=0.1, p=2.8e-3), as shown in figure 2E. Therefore, we deemed the magenta module to be the most clinically significant module for subsequent analysis.

Obtainment of key biomarkers related to OS in LN

To identify variable diagnostic biomarkers in LN, we selected 41 OSEGs related to LN and 894 genes in the magenta module obtained from WGCNA. We used an online Venn platform to identify eight overlapping diagnostic second-step candidate OSEGs (STAT1, GPX7, PRODH, PDK2, HMOX2, NFE2L2, TXN2 and Senataxin (SETX)) (figure 3A). These relevant genes were further used through the LASSO regression analysis algorithm to acquire high-value diagnostic biomarkers. We plotted the coefficients profile using LASSO (figure 3B) and performed cross-validation (figure 3C), which resulted in the selection of six candidate biomarkers, namely STAT1, GPX7, PRODH, NFE2L2, TXN2 and SETX, which were associated with OS in LN.

Figure 3

Results of least absolute shrinkage and selection operator (LASSO) regression model construction with the ‘glmnet’ package in R software. (A) Venn diagram showing the overlap between 41 OSEGs and 894 genes identified from the magenta module of weighted gene co-expression network analysis (WGCNA), with eight genes overlapping (0.9%). (B) Evolution of LASSO regression coefficients, indicating variable selection and shrinkage with escalating penalty values. Each line of the LASSO regression path represents a coefficient from the model and its trajectory as the regularisation parameter increases. (C) The chart illustrates the LASSO model’s regularisation path, with binomial deviance on the y-axis and log (λ) on the x-axis, indicating model fit and complexity with varying λ. Red dots and error bars reflect deviance and non-zero coefficient counts for eight prognostic OSEGs. DEGs, differentially expressed genes; OS, oxidative stress; OSEGs, OS-associated DEGs.

Verification of the OS-related key biomarkers for LN diagnostics

We drew the boxplots to demonstrate the performance in distinguishing LN and control samples in the training and validation sets.

In both the training and validation datasets, there was a significant increase in the expression levels of STAT1, NEF2L2 and SETX, while PRODH and TXN2 exhibited decreased expression levels, all in patients with LN compared with the control group (p<0.05). Additionally, GPX7 demonstrated conflicting trends between the training and validation sets, leading to its exclusion. Consequently, the remaining five hub OSEGs were selected for constructing an applicable diagnostic model for OS-LN diagnosis validation (figure 4A–J).

Figure 4

Boxplots for the validation of differences in the expression levels of the remaining six candidate OSEGs in lupus nephritis (LN). (A–L) Validation of six candidate OSEGs (GPX7, NEF2L2, PRODH, SETX, STAT1 and TXN2) in both GSE32591 and GSE112943 compared with controls (Con) revealed significant differential expression in the LN (A–F), similarly demonstrated in the external validation dataset GSE104948, with GPX7 conflicted, excluded (G–L). DEGs, differentially expressed genes; OS, oxidative stress; OSEGs, OS-associated DEGs.

ROC and AUC scores were generated to evaluate the accuracy of the LASSO model in both datasets (figure 5A–J). A value >0.65 was used to indicate a high diagnostic value based on AUC. Both in the training sets or the validation set, four hub genes had AUC values >0.65 (PRODH, SETX, TXN2 and STAT1), whereas the NEF2L2 with AUC value <0.65 was thus excluded. ROC curve analysis combined these four gene signatures in online supplemental figure S1 demonstrating the predictive accuracy with an AUC of 0.929, indicating high diagnostic power for the specified condition.

Supplemental material

Figure 5

Receiver operating characteristic (ROC) curve and area under the curve (AUC) score to perform the validity of four candidate genes as diagnostic biomarker value for oxidative stress (OS) in lupus nephritis (LN). (A–J) Verifying the remaining five candidate genes in both the training (A–E) and validating (F–J) datasets revealed significant diagnostic value for assessing OS-LN compared with the controls, with NEF2L2, PRODH, SETX, STAT1 and TXN2 showing consistent results. The AUC value of the four-gene-based least absolute shrinkage and selection operator (LASSO) model exceeded 0.65 in these datasets, except for NEF2L2.

We constructed a risk prediction model to demonstrate promising predictive performance on the validation dataset, as indicated by the calibration curve, which suggested consistency between predicted and actual probabilities of disease occurrence. The nomogram depicted the contribution of the four final selected genes (STAT1, PRODH, TXN2 and SETX) to LN risk. DCA illustrated the model’s performance across various threshold probabilities, providing valuable guidance for clinicians in making treatment decisions for patients with LN (figure 6A–C).

Figure 6

Components of the risk prediction model with oxidative stress (OS)-associated differentially expressed genes (DEGs) in lupus nephritis (LN). (A) This calibration curve demonstrates the alignment between predicted OSEGs and observed LN for a predictive model, showing apparent, bias-corrected and ideal relationships. (B) Decision curve analysis showing the net benefit across different threshold probabilities for the four OSEG signatures versus strategies of treating all or none. (C) Nomogram estimates LN risk by locating points corresponding to the expression levels of STAT1, PRODH, TXN2 and SETX genes, summing these points to calculate a total score, and then determining the estimated risk on the bottom axis. OSEGs, OS-associated DEGs.

Immune cell infiltration and its correlation with hub genes

Immune dysregulation is a crucial factor in the progression of LN pathogenesis. In this study, we performed an analysis of hub biomarkers related to immune cells. ssGSEA is an extension of GSEA that calculates individual enrichment scores for each pair of a sample and dataset. Using the ssGSEA, we evaluated the infiltration landscape of 28 immune cell species in the training datasets’ samples to investigate the relationship between these immune cell populations in LN and controls, as depicted by heatmaps.

As shown in figure 7A, there is differential expression of five types of immune cells in LN renal tissue compared with the control group: B cells, T cells, dendritic cells, activated natural killer (NK) cells and memory T cells. Among these, activated B cells, activated CD8 T cells, activated dendritic cells, gamma-delta T cells, immature B cells, derived suppressor cells (MDSCs), macrophages, plasmacytoid dendritic cells, regulatory T cells, T follicular helper cells, central memory CD8 T cells and effector memory CD8 T cells were observed upregulated in LN renal tissue. Among them, the increase in Immature B cells was most significant (p=1.01E-12), suggesting a crucial role of humoral immunity in the pathogenesis of LN. NK cells and eosinophils were downregulated. We assumed that NK cells and eosinophils typically participated in immune surveillance and regulation in LN (figure 7B). Their decrease may be associated with an immunosuppressive state, thereby affecting the immune status and inflammatory response of the kidney. These results indicate that in LN renal tissue, the immune response is activated, and the quantity of certain immune cells, especially immature B cells, is significantly increased, which may be associated with a certain disease state or immune dysregulation.

Figure 7

Assessing the characteristics of immune cell infiltration with the oxidative stress-lupus nephritis (OS-LN) samples. (A) The heatmap depicts expression levels of 28 immune cell subtypes in LN samples versus controls in the training datasets. Columns represent samples, and rows represent cell subtypes. Colours denote upregulation (red) or downregulation (blue) relative to controls. Dendrograms display clustering indicating similarity in expression patterns between samples and subtypes. (B) Violin plots compare immune cell subtype abundance distribution between LN and control samples. Each violin shows abundance distribution, median as a white dot and IQR as a black box. Annotated p values indicate the statistical significance of abundance differences. (C) The correlation matrix heatmap depicts the correlation between immune cell subtypes and specific gene expression (PRODH, STAT1, TXN2 and SETX). Colour intensity indicates correlation magnitude, with red for positive and blue for negative correlation. Significance levels: *p<0.05, **p<0.01, ***p<0.001. MDSCs, derived suppressor cells.

We observed that PRODH, SETX, TXN2 and STAT1 were correlated with multiple immune cells (figure 7C). The correlations between the four hub OSEGs related to LN and immune cell infiltration were investigated. SETX exhibited positive correlations with plasmacytoid dendritic cells, type 1 T helper cells, T follicular helper cells, central memory CD4 T cells and eosinophils (p<0.001), and negative correlations with immature dendritic cells, type 17 T helper cells and gamma-delta T cells (p<0.01). STAT1 showed positive correlations with type 2 T helper cells, type 1 T helper cells, T follicular helper cells, eosinophils, NK cells, NK T cells, regulatory T cells and activated CD4 T cells (p<0.001), and negative correlations with type 17 T helper cells (p<0.001), see figure 6C for details. As for PRODH and TXN2, positive correlations were observed with immature dendritic cells and type 17 T helper cells (p<0.001), while negative correlations were observed with central memory CD4 T cells, eosinophils, activated CD4 T cells, type 1 T helper cells, T follicular helper cells and regulatory T cells (p<0.001), showing an opposite trend compared with the upregulated genes mentioned earlier. This suggests that these four hub genes play a significant role in modulating immune cell infiltration and function, potentially bearing important implications for the pathophysiology and treatment of LN.

GSEA analysis of hub biomarkers

GSEA is a widely used open-source software tool for the analysis of transcriptional profiling data, providing useful visualisation and reporting to interpret results. The MSigDB serves as a reference for annotated gene sets used in GSEA. The results of GSEA, shown in figure 8, identified enriched pathways significantly associated with the four hub biomarker genes for OS-LN based on immunological signature gene sets and KEGG gene sets. The immunological signature gene sets were significantly enriched (|NES|>1; FDR q value <0.05) according to expression levels of PRODH, SETX, TXN2 and STAT1, respectively. Figure 8A–D show the KEGG analyses for GSEA with these hub genes detected multiple enrichment pathways, including the P53 signalling pathway, toll-like receptor signalling pathway, NOD-like receptor signalling pathway, peroxisome proliferator-activated receptor (PPAR) signalling pathway and apoptosis signalling pathway (figure 8D,E), which underscore the critical role of these signalling pathways in the occurrence and development of OS in LN, and their correlation with immune-related genes and the four hub OSEGs.

Figure 8

The visualisation of the enrichment plot for the gene set enrichment analysis (GSEA) immunological signature database with STAT1, PRODH, TXN2 and SETX in oxidative stress-lupus nephritis (OS-LN). The top five signalling pathway GSEA enrichment analysis demonstrates distinct results for the downregulation (A–D) and upregulation (E–H) of PRODH, SETX, STAT1 and TXN2 genes in association with the immune-related gene set. JAK-STAT, Janus kinase-signal transducer and activator of transcription; NF, nuclear factor; NOD, nucleotide-binding oligomerisation domain; PPAR, peroxisome proliferator-activated receptor; TNF, tumour necrosis factor.


SLE is a chronic AID with diverse clinical features, including LN, a severe complication. The link between OS, immune dysfunction and LN development is not fully understood. Investigating LN’s molecular mechanisms is crucial for identifying new treatments and improving outcomes.29 LN involves immune complex-mediated inflammation triggered by autoantibody deposition in glomeruli, exacerbating disease progression.30 31

OS results from an imbalance between ROS and antioxidants, contributing to tissue damage and disease, including LN.32–34 The mechanism of OS in LN is still ambiguous, but OS activates immune cells and modulates lymphocyte differentiation, potentially accelerating AID progression.35 36 High OS levels correlate with immune dysregulation and increased autoantibody production in LN.37 Thus, exploring OS-related biomarkers in patients with LN can improve understanding, prognosis and treatment selection.

Bioinformatics methods such as WGCNA and ML have been widely used for biomarker screening in different diseases.38 39 We used WGCNA to identify OS-LN-related gene modules and potential biomarkers. ML, including LASSO regression, enhances hub biomarker selection accuracy for developing clinical diagnostic models in LN. This study first identified 41 OSEGs in patients with LN through the integration of bioinformatics methods such as ML. GO enrichment analysis indicated that OSEGs were primarily involved in cellular responses to OS and antioxidative activity. KEGG pathway analysis revealed that OSEGs were predominantly enriched in the PI3K-Akt pathway, platelet activation, JAK-STAT signalling pathway and NOD-like receptor pathway. Studies have shown upregulation of the PI3K/Akt/mTOR pathway in a murine model of LN,40 and significantly increased phosphorylation levels of PI3K, Akt and mTOR in renal tissues of patients with LN compared with controls.41 Abnormal activation of platelets may lead to thrombosis and vascular damage, exacerbating renal injury.42 Furthermore, abnormal activation of the JAK-STAT and NOD-like receptor pathways has been implicated in the pathogenesis of LN.43 44 Moreover, the PI3K/Akt signalling pathway was found to mediate the increase in ROS and regulate autophagy.45 These findings suggest that OSEGs may play a role in the progression of LN. A series of hub genes associated with OS-LN progression were determined. Further analysis of immune cell infiltration revealed distinct distribution patterns of these hub OSEGs in renal tissues of patients with LN, particularly significant increases in activated B cells and CD8 T cells. Additionally, four hub OSEG pathway enrichment analyses through GSEA demonstrated the crucial roles of several important signalling pathways, such as PI3K-Akt, apoptosis and JAK-STAT pathways, in the pathogenesis of OS-LN. The enrichment analysis results of OSEGs and hub OSEGs in KEGG pathways were almost consistent, which also aligned well with previously reported findings. These findings delve into the pathophysiological processes of OS-LN, providing important insights for further diagnostic and therapeutic research. Four hub OSEGs related to LN (PRODH, SETX, TXN2 and STAT1) were identified after WGCNA, ML and validation by an external validation set. In the risk assessment system established through a nomogram, four hub OSEGs identified as biologically significant in the progression of LN were determined, elucidating the extent of their contribution to LN risk.

STAT1 promotes OS, disrupting antioxidant defences and contributing to disease.46 47 Excessive activation of STAT1 enhances the effects of interferons (IFN-I and IFN-γ) on immune cells, potentially leading to aberrant activation of immune cells such as T cells and B cells, promoting the release of inflammatory factors, exacerbating renal inflammation and injury, thus promoting the pathological processes of LN.48 As a downstream gene of the JAK-STAT signalling pathway, STAT1 is a key factor in regulating immune responses and inflammation.49 Its upregulation in LN may reflect abnormal activation of immune cells and excessive production of proinflammatory factors. This function of STAT1 reveals its potential bridging role between immune system abnormalities and the pathological status of LN, making it a feasible target for therapeutic intervention. The efficacy of JAK2 inhibitor AG490 in alleviating LN symptoms, as demonstrated in studies, supports this notion.50 Furthermore, ongoing clinical trials exploring blockade of the JAK/STAT pathway underscore the potential for finding new therapeutic approaches in LN management.51 Therefore, interventions targeting STAT1 and its upstream and downstream signalling pathways may offer new therapeutic opportunities for LN.

Furthermore, our research cued the roles of PRODH and TXN2 in LN. Based on their biological properties: PRODH, regulated by p53, functions in proline metabolism and redox reactions,52 53 while TXN2 plays a crucial role in maintaining intracellular redox balance.54 This suggests their involvement in the pathogenesis of LN through participation in metabolic and redox processes. Although their roles in LN remain incompletely elucidated, we observed that decreased expression of PRODH and TXN2 may contribute to the pathogenesis of LN, suggesting potential therapeutic targets for LN by enhancing the PPAR signalling pathway. Similarly, our study detected SETX, an RNA/DNA helicase crucial for transcription termination and genomic integrity,55 whose expression levels were significantly upregulated in patients with LN. We speculated that SETX may lead to abnormalities in DNA damage repair and transcriptional regulation, contributing to the pathogenesis of LN. Like PRODH and TXN2, SETX represents another point of interest for future research aimed at elucidating its role in LN and exploring its therapeutic potential.

The characteristic feature of LN is immune cell infiltration.56 57 Studies have shown that T cells and macrophages in renal interstitium correlate with OS in hypertensive nephropathy.58 OS heightens ROS–immune cell interaction, worsening renal damage and inflammation. This study found that compared with the healthy control group, in LN, there is upregulation of B cells, CD8 T cells, dendritic cells, macrophage and regulatory T cells, which may influence disease progression by contributing to the production of autoantibodies, immune system imbalance, disruption of immune tolerance, exacerbation of local inflammation and damage, reflecting aberrant activity of the immune system. These results are consistent with previous research findings.12 59 60 Additionally, our research identified downregulation of NK cells and eosinophils, which may result in decreased recognition and clearance ability of abnormal cells, potentially leading to immune suppression and affecting the immune status and inflammatory response in the kidneys, thus contributing to LN pathogenesis.61 Our work found that SETX and STAT1 are highly expressed in LN and may be involved in LN pathogenesis through OS. SETX positively correlates with plasmacytoid dendritic cells and T helper cells, suggesting it may contribute to LN pathology by producing excessive immunoglobulins and proinflammatory factors. STAT1 positively correlates with T cells, B cells and NK cells, indicating ongoing immune system activation leading to inflammation and tissue damage. PRODH and TXN2 negatively correlate with central memory CD4 T cells and regulatory T cells, potentially affecting immune tolerance and regulatory functions, leading to immune system attacks on self-tissues. These associations suggest that genes like SETX, STAT1, PRODH and TXN2 play different roles in immune cell infiltration, affecting immune regulation and inflammatory responses, thus impacting disease progression and severity.

Despite the comprehensive methodology employed, several limitations warrant consideration. First, the reliance on retrospective data from public repositories may introduce biases and limit the generalisability of findings. Second, the study predominantly focuses on molecular mechanisms, warranting further validation through experimental and clinical studies. However, we validated the results with an external dataset, employing rigorous statistical methods to ensure accuracy. In the future, single-cell RNA sequencing holds promise in dissecting cellular heterogeneity within renal tissue and identifying novel subtypes and therapeutic targets. It unveils the LN microenvironment, fostering the development of personalised treatment strategies. Moreover, it facilitates causal inference analysis, enhancing predictions of immunotherapeutic responses. This study offers novel perspectives on LN by providing new diagnostic biomarkers, immune infiltration study models and mechanisms of OS involvement. Future research endeavours should prioritise prospective, multicentre studies to validate identified biomarkers and therapeutic targets, ultimately enhancing clinical outcomes in patients with LN.


In summary, this study sheds light on the intricate interplay between OS, immune dysregulation and molecular pathways in the pathogenesis of LN. Employing rigorous methodologies, we identify novel hub genes such as STAT1, PRODH, TXN2 and SETX, which hold potential as therapeutic targets and diagnostic biomarkers. These findings contribute to a deeper mechanistic understanding of LN and lay a solid groundwork for future translational studies aimed at refining management strategies with precision and expertise.

Data availability statement

Data are available upon reasonable request. The LN cohort’s transcriptional dataset (GSE32591, GSE112943 and GSE104948) was downloaded from the Gene Expression Omnibus (GEO) database (

Ethics statements

Patient consent for publication

Ethics approval

This study used data from public databases, where samples had already obtained ethical approval and informed consent from participants in the original research. Thus, this study, based on publicly available data, does not directly involve human or animal experiments, making additional ethical approval and consent unnecessary. Throughout the research, all data processing adhered to relevant ethical and privacy guidelines.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • HZ and YZhu contributed equally.

  • Contributors Conception and design: HZ, WL and YZhu. Data curation: all authors. Formal analysis: YZha, XY, XH and YZhu. Investigation: all authors. Methodology: all authors. Project administration: WL, QQ and YZha. Resources: all authors. Data analysis: HZ, XY, QQ and YZhu. Supervision: YZhu, XH and YZha. Writing—original draft: HZ, XY, WL and YZha. Writing—review and editing: HZ and YZhu. HZ is responsible for the overall content as the guarantor. Approval of final manuscript: all authors.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.