2024 年5 期 第32 卷
论著基于机器学习算法筛选肥厚性心肌病铁死亡的 潜在疾病特征基因
Screening of Potential Disease Characteristic Genes of Ferroptosis in Hypertrophic Cardiomyopathy Based on Machine Learning Algorithm
作者:尤红俊1 ,赵倩倩2 ,苟棋玲1 ,武锋超1 ,刁佳宇1 ,程功1 ,董梦雅1
- 单位:
- 1.710068陕西省西安市,陕西省人民医院心血管内科 2.710100陕西省西安市,西安国际医学中心医院心 肺康复科
- 单位(英文):
- 1.Department of Cardiovascular Medicine, Shaanxi Provincial People's Hospital, Xi'an 710068, China 2.Department of Cardiopulmonary Rehabilitation, Xi'an International Medical Center Hospital, Xi'an 710100, China
- 关键词:
- 心肌病,肥厚性;铁死亡;差异表达基因;随机森林;人工神经网络
- 关键词(英文):
- Cardiomyopathy, hypertrophic; Ferroptosis; Differentially expressed genes; Random forest; Artificial neural network
- 中图分类号:
- R 542.2
- DOI:
- 10.12114/j.issn.1008-5971.2024.00.119
- 基金项目:
- 陕西省人民医院2022年科技人才支持计划项目(菁英人才)资助项目(2022JY-45);陕西省人民医院2023年科技发 展孵化基金资助项目(2023YJY-63)
摘要:
目的 基于机器学习算法筛选肥厚性心肌病(HCM)铁死亡的潜在疾病特征基因。方法 从基因表 达数据库(GEO)中下载GSE36961、GSE141910数据集,其中GSE36961数据集包括106例HCM患者和39例健康对照 者,作为训练集;GSE141910数据集包括28例HCM患者和166例健康对照者,作为测试集。使用R语言“limma”包筛 选GSE36961数据集中HCM患者与健康对照者之间的差异表达基因(DEGs),然后与铁死亡数据库(FerrDb)中的259 个铁死亡相关基因取交集,以筛选HCM铁死亡相关DEGs。采用随机森林筛选疾病特征基因,绘制热图以分析疾病特 征基因在测试集中的表达情况,并基于疾病特征基因构建人工神经网络(ANN)模型;绘制ROC曲线以评估ANN模型 对训练集、测试集HCM的预测价值。结果 从GSE36961数据集中筛选出2 959个DEGs,与铁死亡数据库中259个铁死 亡相关基因取交集后获得72个HCM铁死亡相关DEGs。采用随机森林从72个HCM铁死亡相关DEGs中筛选出9个疾病特 征基因,分别为ALOX5、ZFP36、RGS4、DDIT3、LPCAT3、SOCS1、EGLN2、NNMT和DUSP1。热图分析结果显示, RGS4、DDIT3表达上调,ALOX5、ZFP36、LPCAT3、SOCS1、EGLN2、NNMT、DUSP1表达下调。基于9个疾病特征 基因构建ANN模型。ROC曲线分析结果显示,ANN模型预测训练集HCM的AUC为1.000〔95%CI(0.998~1.000)〕, 预测测试集HCM的AUC为0.817〔95%CI(0.745~0.881)〕。结论 ALOX5、ZFP36、RGS4、DDIT3、LPCAT3、 SOCS1、EGLN2、NNMT和DUSP1是HCM铁死亡的潜在疾病特征基因。
英文摘要:
Objective To screen potential disease characteristic genes of ferroptosis in hypertrophic cardiomyopathy (HCM) based on machine learning algorithm. Methods The GSE36961 and GSE141910 datasets were downloaded from the Gene Expression Omnibus (GEO) . The GSE36961 dataset including 106 HCM patients and 39 healthy controls was as the training set. The GSE141910 dataset including 28 HCM patients and 166 healthy controls was as the test set. The R language "limma" package was used to screen the differentially expressed genes (DEGs) between HCM patients and healthy controls in the GSE36961 dataset, and then they were intersected with 259 ferroptosis-related genes in the ferroptosis database (FerrDb) to screen DEGs related to ferroptosis in HCM. The disease characteristic genes were screened by random forest, and the heat map was drawn to analyze the expression of disease characteristic genes in the test set, and the artificial neural network (ANN) model was constructed based on the disease characteristic genes. ROC curve was drawn to evaluate the predictive value of ANN model for HCM in training set and test set. Results A total of 2 959 DEGs were screened from the GSE36961 dataset, and 72 HCM ferroptosis-related DEGs were obtained after intersection with 259 ferroptosis-related genes in the ferroptosis database. Nine disease characteristic genes, ALOX5, ZFP36, RGS4, DDIT3, LPCAT3, SOCS1, EGLN2, NNMT and DUSP1, were screened from 72 HCM ferroptosis-related DEGs by random forest. The results of heat map analysis showed that the expression of RGS4 and DDIT3 was up-regulated, and the expression of ALOX5, ZFP36, LPCAT3, SOCS1, EGLN2, NNMT and DUSP1 was down regulated. An ANN model was constructed based on 9 disease characteristic genes. ROC curve analysis showed that the AUC of ANN model for predicting HCM in training set was 1.000 [95 %CI (0.998-1.000) ] , and the AUC of ANN model for predicting HCM in test set was 0.817 [95%CI (0.745-0.881) ] . Conclusion ALOX5, ZFP36, RGS4, DDIT3, LPCAT3, SOCS1, EGLN2, NNMT and DUSP1 are potential disease characteristic genes of ferroptosis in HCM.
参考文献: