中文|English

Current issue
2024-5-25
Vol 32, issue 5

ISSUE

2024 年3 期 第32 卷

论著 HTML下载 PDF下载

基于生物信息学技术和机器学习算法筛选 急性心肌梗死核心基因

Core Genes of Acute Myocardial Infarction Screeded by Bioinformatics Technology and Machine Learning Algorithm

作者:李淑娟1 ,柯妍2 ,刘旭东3 ,贺茜4 ,徐遥琴4 ,田宇佳4 ,卢冠军3 ,马娟5 , 朱澈1 ,汪乐新3,4

单位:
1.750004宁夏回族自治区银川市,宁夏医科大学总医院急诊科 2.750003宁夏回族自治区银川市第一人民 医院全科医学科 3.750004宁夏回族自治区银川市,宁夏医科大学总医院心胸外科 4.750004宁夏回族自治区银川 市,宁夏医科大学 5.750001宁夏回族自治区银川市妇幼保健院儿科
Units:
1.Department of Emergency, General Hospital of Ningxia Medical University, Yinchuan 750004, China 2.General Practice Department, the First People's Hospital of Yinchuan, Yinchuan 750003, China 3.Department of Cardiothoracic Surgery, General Hospital of Ningxia Medical University, Yinchuan 750004, China 4.Ningxia Medical University, Yinchuan 750004, China 5.Department of Pediatrics, Maternal and Child Health Hospital of Yinchuan, Yinchuan 750001, China
关键词:
心肌梗死;核心基因;生物信息学;机器学习
Keywords:
Myocardial infarction; Core genes; Bio-informatics; Machine learning
CLC:
R 542.22
DOI:
10.12114/j.issn.1008-5971.2024.00.074
Funds:
国家自然科学基金资助项目(82060139);宁夏自然科学基金一般项目(2023AAC03679)

摘要:

目的 基于生物信息学技术和机器学习算法筛选急性心肌梗死(AMI)核心基因,并采用细胞实验进 行验证。方法 本实验时间为2021—2022年。从美国国立生物技术信息中心(NCBI)的高通量基因表达(GEO)数 据库下载与AMI相关的3个mRNA基因芯片数据集(GSE34198、GSE66360和GSE83500),其中GSE66360和GSE83500 为测试集,GSE34198为验证集。运用R 4.2.0软件中的“limma包”筛选GSE66360和GSE83500中差异表达基因。使用 LASSO回归方法缩小差异表达基因的范围,然后使用支持向量机-递归特征消除(SVM-RFE)方法在差异表达基因 中寻找特征基因,取两种机器学习算法的交集,即为核心基因。比较测试集中AMI组和对照组核心基因表达水平, 绘制ROC曲线以评估核心基因表达水平对测试集、验证集受试者发生AMI的预测价值。将衰老心肌细胞随机分为正 常氧组和缺氧/复氧组,其中正常氧组心肌细胞常规培养;缺氧/复氧组心肌细胞缺氧3 h后复氧2 h,以制备AMI细胞 模型。采用qPCR法检测心肌细胞IL1R2、NR4A2、TREM1 mRNA相对表达量。结果 从GSE66360和GSE83500中筛 选出145个AMI差异表达基因。在差异表达基因中,通过LASSO回归分析筛选出10个特征基因,通过SVM-RFE方法筛 选出10个特征基因,取交集得到9个核心基因,分别为NFIL3、IL1R2、NR4A2、IRAK3、VCAN、CCL20、TREM1、 LYZ、ITLN1。在测试集中,AMI组仅IL1R2、NR4A2、TREM1表达水平高于对照组(P<0.05)。ROC曲线分析结果 显示,IL1R2、NR4A2、TREM1表达水平预测测试集受试者发生AMI的AUC分别为0.648〔95%CI(0.534~0.756)〕、 0.623〔95%CI(0.511~0.728)〕、0.622〔95%CI(0.502~0.730)〕;IL1R2、NR4A2、TREM1表达水平预测验证集 受试者发生AMI的AUC分别为0.834〔95%CI(0.761~0.898)〕、0.866〔95%CI(0.802~0.923)〕、0.808〔95%CI (0.729~0.880)〕。缺氧/复氧组心肌细胞IL1R2、NR4A2、TREM1 mRNA相对表达量高于正常氧组(P<0.05)。结 论 IL1R2、NR4A2、TREM1是AMI核心基因,三者有望成为AMI潜在的生物标志物。

Abstract:

Objective To screen the core genes of acute myocardial infarction (AMI) based on bioinformatics technology and machine learning algorithm, and verify the core genes by cell experiments. Methods The study period was 2021 to 2022. Three mRNA microarray data sets (GSE34198, GSE66360 and GSE83500) related to AMI were downloaded from the Gene Expression Omnibus (GEO) database of the National Center for Biotechnology Information (NCBI) . Among them, GSE66360 and GSE83500 were test sets, and GSE34198 was validation set. Differentially expressed genes in GSE66360 and GSE83500 were screened using the "limma package" in R 4.2.0 software. The LASSO regression method was used to narrow the range of differentially expressed genes, and then the support vector machine-recursive feature elimination (SVM RFE) method was used to find characteristic genes in differentially expressed genes. The intersection of the two machine learning algorithms was the core gene. The expression levels of core genes in test sets were compared between the AMI group and the control group in the test set, and the ROC curve was drawn to evaluate the predictive value of core genes for AMI in subjects in the test set and validation set. The senescent cardiomyocytes were randomly divided into normal oxygen group and hypoxia/reoxygenation group. Myocardial cells in normal oxygen group were routinely cultured. Myocardial cells in hypoxia/reoxygenation group were subjected to hypoxia for 3 h and reoxygenation for 2 h to prepare AMI cell model. The relative expression level of IL1R2, NR4A2 and TREM1 mRNA in myocardial cells was detected by qPCR. Results A total of 145 differentially expressed genes of AMI were screened from GSE66360 and GSE83500. Among the differentially expressed genes, 10 characteristic genes were screened by LASSO regression analysis, 10 characteristic genes were screened by SVM-RFE method, and 9 core genes were obtained by intersection, which were NFIL3, IL1R2, NR4A2, IRAK3, VCAN, CCL20, TREM1, LYZ, ITLN1. In test set, only the expression levels of IL1R2, NR4A2 and TREM1 in the AMI group were higher than those in the control group (P < 0.05) . The results of ROC curve analysis showed that the AUC of IL1R2, NR4A2 and TREM1 expression levels in predicting AMI in subjects in the test set was 0.648 [95%CI (0.534-0.756) ] , 0.623 [95%CI (0.511-0.728) ] and 0.622 [95%CI (0.502-0.730) ] , respectively. The AUC of IL1R2, NR4A2 and TREM1 expression levels in predicting AMI in subjects in the validation set was 0.834 [95%CI (0.761-0.898) ] , 0.866 [95%CI (0.802- 0.923) ] and 0.808 [95%CI (0.729-0.880) ] , respectively. The relative expression levels of IL1R2, NR4A2 and TREM1 mRNA in myocardial cells of hypoxia/reoxygenation group were higher than those of normal oxygen group (P < 0.05) . Conclusion IL1R2, NR4A2 and TREM1 are the core genes of AMI, which are expected to be potential biomarkers of AMI.

ReferenceList: