School of Electrical Engineering and Automation,Anhui University;Sungrow Power Supply Limited Liability Company;
[Objective] DNAN4-methylcytosine(4m C) modification plays a crucial role in various cellular processes, including DNA replication, cell cycle regulation, and gene expression, making it an essential epigenetic marker. Understanding and accurately identifying 4mC sites is important for uncovering the mechanisms behind epigenetic regulation in disease and other biological functions. However, traditional 4mC site prediction technologies often suffer from high costs and time inefficiencies, limiting their scalability for large-scale applications. Although several intelligent computing-based 4mC predictors have been proposed over the past decade, their performance remains unsatisfactory. Therefore, developing effective methods to fully utilize the complex interactions within DNA sequences has become a major challenge for improving prediction capabilities. [Methods] A multilevel feature extraction module is introduced, utilizing convolutional layers, bidirectional long short-term memory networks, and an attention mechanism as core components. This setup captures long-term dependencies within DNA sequences, ensuring accurate 4mC site detection. In addition, a multiscale feature extraction module, centered on an improved SENet network, extracts multiscale expressions of location features, improving the model's ability to represent complex sequence characteristics. To further improve feature capture, a parallel feature fusion-based optimization method is proposed. Finally, to address strong imbalances in the number of candidates across different species, the class weights in the cross-entropy loss function are designed to balance the training process. [Results] A deep learning-based dual-path multiscale feature fusion approach is proposed in this work for 4mC site prediction. To validate the structural design of the model, ablation variants were performed with variants, including the SCGF-4mC, SMFI-4mC, and DCMF-4mC models. These experiments demonstrated the structural superiority of the proposed framework. In addition, the model was compared with several advanced 4mC site prediction methods currently available. Results indicate that the proposed 4mC site predictor achieved higher accuracy and stronger generalization ability. Model feature analysis experiments were also conducted using feature matrices generated by four encoding methods as inputs. Comparative evaluations using MCC and ACC metrics on an independent test set confirmed the model's stability and reliability. Meanwhile, spatial distribution calculations of 4mC and non-4mC samples across different species provided compelling evidence of the model's ability to effectively learn and recognize 4mC loci. In summary, the proposed deep learning-based method demonstrated greater accuracy and stronger generalization ability in predicting 4mC sites across six species. [Conclusions] The proposed method demonstrates the capability to identify 4mC sites in a multispecies environment, enhancing predictive performance and offering valuable support for identifying 4mC sites in DNA sequences.
80 | 0 | 0 |
Downloads | Citas | Reads |
[1] FLUSBERG B A, WEBSTER D R, LEE J H, et al. Direct detection of dna methylation during single-molecule, real-time sequencing[J]. Nature Methods, 2010, 7(6):461–465.
[2] CHEN W, YANG H, FENG P M, et al. iDNA4mC:Identifying dna N4-methylcytosine sites based on nucleotide chemical properties[J]. Bioinformatics, 2017, 33(22):3518–3523.
[3] KHANAL J, NAZARI I, TAYARA H, et al. 4mCCNN:Identification of N4-methylcytosine sites in prokaryotes using convolutional neural network[J]. IEEE Access, 2019(7):145455–145461.
[4] SANTOS K F, MAZZOLA T N, CARVALHO H F. The prima donna of epigenetics:The regulation of gene expression by dna methylation[J]. Brazilian Journal of Medical and Biological Research, 2005, 38(10):1531–1541.
[5] NICHOLAS A P, BHATTACHARYA S K. Dna methylation 40years later:Its role in human health and disease[J]. Journal of Cellular Physiology, 2005, 204(1):21–35.
[6] PATAILLOT-MEAKIN T, PILLAY N, BECK S. 3-methylcytosine in cancer:An underappreciated methyl lesion?[J]. Epigenomics,2016, 8(4):451–454.
[7] YU M, JI L X, NEUMANN D A, et al. Base-resolution detection of N4-methylcytosine in genomic dna using 4mC-Tet-assistedbisulfite-sequencing[J]. Nucleic Acids Research, 2015, 43(21):4–10.
[8] RATHI P, MAURER S, SUMMERER D. Selective recognition of N4-methylcytosine in dna by engineered transcriptionactivator-like effectors[J]. Philosophical Transactions of the Royal Society of London, 2018(373):1748.
[9] HE W Y, JIA C Z, ZOU Q. 4mCPred:Machine learning methods for dna N4-methylcytosine sites prediction[J]. Bioinformatics,2019, 35(4):593–601.
[10] WEI L Y, LUAN S S, NAGAI L A E, et al. Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species[J]. Bioinformatics,2019, 35(8):1326–1333.
[11] YANG J L, LANG K, ZHANG G L, et al. SOMM4mC:A second-order markov model for DNA N4-methylcytosine site prediction in six species[J]. Bioinformatics, 2020, 36(14):4103–4105.
[12] LIU Q Z, CHEN J X, WANG Y Z, et al. DeepTorrent:A deep learning-based approach for predicting DNA N4-methylcytosine sites[J]. Briefings in Bioinformatics, 2021, 22(3):1–14.
[13] LIU C T, SONG J N, OGATA H, et al. MSNet-4mC:Learning effective multi-scale representations for identifying DNA N4-methylcytosine sites[J]. Bioinformatics, 2022, 38(23):5160–5167.
[14] WANG L Y, TIWARI P, DING Y J, et al. Weighted fuzzy system for identifying DNA N4-methylcytosine sites with kernel entropy component analysis[J]. IEEE Transactions on Artificial Intelligence,2023, 5(2):895–903.
[15]王刚,汤宇飞,王晚秋,等.基于注意力机制的多模态脉搏波分析实验与算法设计[J].实验技术与管理, 2023, 40(8):63–71.WANG G, TANG Y F, WANG W Q, et al. Experiment and algorithm design of multimodal pulse wave analysis based on attention mechanism[J]. Experimental Technology and Management,2023, 40(8):63–71.(in Chinese)
[16] YE P H, LUAN Y Z, CHEN K N, et al. MethSMRT:An integrative database for DNA N6-methyladenine and N4-methylcytosine generated by single-molecular real-time sequencing[J]. Nucleic Acids Research, 2017, 45(D1):D85–D89.
[17] ZHANG Y, LIU Y, XU J, et al. Leveraging the attention mechanism to improve the identification of DNA N6-methyladenine sites[J]. Briefings in Bioinformatics, 2021, 22(6):1–13.
[18]张建良,季瑞松.基于贝叶斯优化ResNet-BiLSTM的多电飞行器故障检测技术[J].实验技术与管理, 2024, 41(10):10–17.ZHANG J L, JI R S. Fault detection technology combining ResNet and BiLSTM for multielectric aircraft based on Bayesianoptimization[J]. Experimental Technology and Management,2024, 41(10):10–17.(in Chinese)
Basic Information:
DOI:10.16791/j.cnki.sjg.2025.04.009
China Classification Code:TP18;Q811.4
Citation Information:
[1]黄泽霞,李煨,邵春莉等.基于双路径多尺度特征融合的4mC位点预测方法[J].实验技术与管理,2025,42(04):68-77.DOI:10.16791/j.cnki.sjg.2025.04.009.
Fund Information:
教育部产学合作协同育人项目(230700005272541,231103177230726); 安徽大学线上线下混合式课程项目(2023xjzlgc124)