Anticancer Peptides Classification Using Long-Short-Term Memory With Novel Feature Representation
Al Tahifah N. Ibrahim M.S. Rehman E. Ahmed N. Wahab A. Khan S.
2025Institute of Electrical and Electronics Engineers Inc.
IEEE Access
2025#1367 - 79 pp.
Cancer treatment is a challenging endeavor because of the intricacy, heterogeneity, and diversity of cancer causes. Comprehensive therapeutic approaches are crucial for cancer treatment. Anticancer peptides (ACPs) present a potentially effective therapeutic option. However, the extensive identification and synthesis of these peptides present a persistent difficulty that calls for the creation of effective prediction techniques. Existing techniques either suffer from low accuracy or employ high-dimensional feature sets, frequently producing sparse features and leading to ineffective model designs. This work presents a novel set of features and a long-short-term-memory (LSTM)-based classification strategy to create an efficient model. The suggested feature set includes three new and two modern feature extraction methods. The binary profile feature and k-mer sparse matrix of the reduced amino acid alphabet are part of the modern feature set. The combination of the composition of the K-spaced side chain pairs (CKSSCP), the composition of the K-spaced electrically charged side chain pairs (CKSECSCP), and the combination of [pk(CO2H)] + [pk(NH2)] + [pk(R)] + [isoelectric point] is used to derive the novel features. The suggested LSTM model is trained using the combined feature set. The trials are carried out with a k-fold cross-validation method on benchmark datasets. The results indicate that the proposed model outperforms alternative ACP classification techniques in terms of Mathews correlation coefficient (MCC) and accuracy. The ACP740 dataset with 5-folds yields an MCC score of 75%, which is 12%, 11%, 3%, and 8% greater than those of the ACP-DL, ACP-DA, ACP-MHCNN, and ACP-KSRC approaches, respectively. For the ACP344 dataset with 10-folds, the proposed method achieves an MCC score of 85.14%, which is 23% and 2% higher than the MCC scores of ACP-DL and SAP methods, respectively. Better classification performance offered by the proposed approach could help identify new ACPs and better understand their structural and chemical characteristics. The source code and the datasets are available on the authors GitHub page (https://github.com/Shujaat123/ACP-LSTM-NFR).
Anticancer peptides (ACPs) , composition of K-spaced amino-acid pairs (CKSAAP) , composition of the K-spaced electrically charged side chain pairs (CKSECSCP) , composition of the K-spaced side chain pairs (CKSSCP) , isoelectric point (pI) , long-short-term-memory (LSTM)
Text of the article Перейти на текст статьи
King Fahd University of Petroleum & Minerals, College of Computing and Mathematics, Department of Computer Engineering, Dhahran, 31261, Saudi Arabia
King Fahd University of Petroleum and Minerals, SDAIA-KFUPM Joint Research Center for Artificial Intelligence, Dhahran, 31261, Saudi Arabia
Kumoh National Institute of Technology, Department of Mechanical Systems Engineering, Gumi-si, 39177, South Korea
Nazarbayev University, Department of Mathematics, Astana, 010000, Kazakhstan
Gulf University for Science and Technology (GUST), Center for Applied Mathematics and Bio-Informatics (CAMB), Department of Mathematics and Natural Sciences, Mubarak Al-Abdullah, 32093, Kuwait
Sultan Qaboos University, College of Science, Department of Mathematics, Muscat, 123, Oman
King Fahd University of Petroleum & Minerals
King Fahd University of Petroleum and Minerals
Kumoh National Institute of Technology
Nazarbayev University
Gulf University for Science and Technology (GUST)
Sultan Qaboos University
10 лет помогаем публиковать статьи Международный издатель
Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026