2024 年5 期 第32 卷
论著基于加权基因共表达网络分析和机器学习算法探究 葡萄糖代谢参与缺血性脑卒中的机制及其关键基因
Mechanism and Key Genes Involved in Glucose Metabolism in Ischemic Stroke Based on Weighted Gene Co-expression Network Analysis and Machine Learning
作者:王曼曼1,2 ,宁文华1 ,王海明1 ,刘伊滢1 ,李思颖1,2 ,高京1,2 ,魏向阳1
- 单位:
- 1.450052河南省郑州市,郑州大学第一附属医院康复医学科 2.450052河南省郑州市,郑州大学医学科学院
- 单位(英文):
- 1.Department of Rehabilitation Medicine, the First Affiliated Hospital of Zhengzhou University, Zhengzhou 450052, China 2.Academy of Medical Science, Zhengzhou University, Zhengzhou 450052, China
- 关键词:
- 缺血性卒中;葡萄糖代谢;加权基因共表达网络分析;机器学习;关键基因
- 关键词(英文):
- Ischemic stroke; Glucose metabolism; Weighted gene co-expression network analysis; Machine learning; Key genes
- 中图分类号:
- R 743.3
- DOI:
- 10.12114/j.issn.1008-5971.2024.00.077
- 基金项目:
- 河南省医学科技攻关计划联合共建项目(LHGJ20220352);河南省科技厅项目(22170112)
摘要:
目的 基于加权基因共表达网络分析(WGCNA)和机器学习算法探究葡萄糖代谢参与缺血性脑卒中 (IS)的机制及其关键基因。方法 本实验时间为2022年11月—2023年6月。从GEO数据库GPL6883平台下载IS芯片 数据集GSE16561,其包括39例IS患者(IS组)和24例健康受试者(对照组)外周血的总RNA表达谱数据,对其进行 预处理,基于WGCNA筛选与IS组相关性最强的模块内基因,将其与在GeneCards数据库选取的相关性得分>5分的基 因取交集,得到IS葡萄糖代谢相关基因并对其进行基因本体论(GO)功能富集分析。基于蛋白质相互作用(PPI)网 络分析,采用机器学习算法〔随机森林(RF)、支持向量机递归特征消除(SVM-RFE)算法〕筛选IS葡萄糖代谢关 键基因。比较GSE16561中IS组和对照组IS葡萄糖代谢关键基因表达量,同时绘制IS葡萄糖代谢关键基因诊断IS的ROC 曲线,通过曲线下面积(AUC)评估其诊断效能。 结果 WGCNA 共得到了 11个基因共表达模块,其中棕色模块与 IS 组相关性最强(r=0.56,P=2×10 -6 ),其共包含461 个基因。 GeneCards 数据库中相关性得分> 5分的基因共2 386个, 将其与棕色模块基因取交集,共得到85个IS葡萄糖代谢相关基因。GO功能富集分析结果显示,IS葡萄糖代谢相关基 因主要涉及的生物学过程(BP)为对肽的反应、对肽激素的反应、细胞碳水化合物代谢过程,主要涉及的细胞成分 (CC)为富含纤维胶凝蛋白1的颗粒、分泌颗粒腔、细胞质囊泡腔,主要涉及的分子功能(MF)为类泛素蛋白连接酶 结合、磷蛋白结合、蛋白酶结合。PPI网络分析结果显示,得到了1个包含12个基因的核心模块。将RF算法以及SVM RFE算法得到的关键基因取交集,最终得到4个IS葡萄糖代谢关键基因,分别为MMP9、STAT3、ITGAM、TLR2。IS组 MMP9、STAT3、ITGAM、TLR2表达量高于对照组(P<0.05)。ROC曲线分析结果显示,MMP9、STAT3、ITGAM、 TLR2诊断IS的AUC分别为0.855〔95%CI(0.762~0.948)〕、0.872〔95%CI(0.784~0.960)〕、0.842〔95%CI (0.747~0.936)〕、0.829〔95%CI(0.727~0.931)〕。结论 IS葡萄糖代谢相关基因主要通过影响炎症反应及氧化 应激等而参与IS的发生发展,且MMP9、STAT3、ITGAM、TLR2为IS葡萄糖代谢关键基因,这可为葡萄糖代谢参与IS的 相关研究提供新思路。
英文摘要:
Objective To explore the mechanism and key genes involved in glucose metabolism in ischemic stroke (IS) based on weighted gene co-expression network analysis (WGCNA) and machine learning. Methods The experiment was conducted from November 2022 to June 2023. The IS chip dataset GSE16561 was downloaded from the GEO database GPL6883 platform, which included the total RNA expression profile data of peripheral blood of 39 IS patients (IS group) and 24 healthy subjects (control group) . The total RNA expression profile data were preprocessed, and the genes in the module with the strongest correlation with the IS group were screened based on WGCNA. These genes were intersected with genes with correlation scores > 5 in the GeneCards database to obtain the genes related to glucose metabolism of IS and gene ontology (GO) functional enrichment analysis was performed on them. Based on protein-protein interaction (PPI) network analysis, machine learning methods [random forest (RF) and support vector machine-recursive feature elimination (SVM-RFE) algorithm] were used to screen key genes in glucose metabolism of IS. The expression levels of key genes in glucose metabolism of IS were compared between the IS group and the control group in GSE16561. ROC curve of key genes in glucose metabolism of IS in diagnosing IS was drawn, and area under the curve (AUC) was used to evaluate their diagnostic efficacy. Results A total of 11 gene co-expression modules were obtained by WGCNA, among which the brown module had the strongest correlation with the IS group (r=0.56, P=2×10 -6) , containing a total of 461 genes. There were 2 386 genes with correlation score > 5 in GeneCards database, and 85 genes related to glucose metabolism of IS were obtained by intersected with brown module genes. The results of GO functional enrichment analysis showed that the genes related to glucose metabolism of IS were mainly involved in biological processes (BP) , such as peptide response, peptide hormone response and cellular carbohydrate metabolism, cell components (CC) , such as granules rich in ficolin-1, secretory granules and cytoplasmic vesicle cavities, and molecular functions (MF) , such as ubiquitin-like protein ligase binding, phosphoprotein binding and protease binding. PPI network analysis results showed that a core module containing 12 genes was obtained. The intersection of key genes obtained by RF algorithm and SVM-RFE algorithm was used to obtain 4 key genes of glucose metabolism of IS, namely MMP9, STAT3, ITGAM and TLR2. The expressions of MMP9, STAT3, ITGAM and TLR2 in IS group were higher than those in control group (P < 0.05) . The ROC curve analysis results showed that the AUC of MMP9, STAT3, ITGAM, and TLR2 for diagnosing IS was 0.855 [95%CI (0.762-0.948) ] , 0.872 [95 %CI (0.784-0.960) ] , 0.842 [95%CI (0.747-0.936) ] , and 0.829 [95%CI (0.727-0.931) ] , respectively. Conclusion Genes related glucose metabolism of IS are mainly involved in the occurrence and development of IS by influencing inflammation and oxidative stress, and MMP9, STAT3, ITGAM, TLR2 are key genes for glucose metabolism of IS, which can provide new ideas for the research on glucose metabolism involved in IS.
参考文献: