Learning the Pattern-based CRF for Prediction of a Protein Local Structure
Mukanov Z. Takhanov R.
2022Slovene Society Informatika
Informatica (Slovenia)
2022#46Issue 6135 - 141 pp.
Prediction of protein conformation from its amino acid sequence is widely acknowledged as one of the most important computational biology problems and is considered a source of interesting problem formulations for machine learning. Here methods of supervised learning stay side by side with statistical physics and information theory. According to classical results of Anfinsen, protein conformational structure is fully determined by its primary structure, i.e., amino acid sequence, and energy landscape theory says that the native state of a protein corresponds to the minimum of its free energy [2]. There are two dominating approaches to protein structure prediction, the first is based on minimizing physics-based free energies with some unknown parameters, and the second is a knowledge-based approach that does not necessarily use the notion of free energy and aims only to yield high prediction accuracy [14]. In comparison to these two approaches, there is a deficit in intermediate approaches where the goal is to find such knowledge-based parameterizations of free energy that would approximate real free energy for certain protein families and have a high accuracy of prediction comparable with pure knowledge-based approaches. According to M. Gromov, if energy landscape theory is true, then “probably, free energy can be encoded with a reasonable accuracy by something like 104 − 106 bits of information”, and the main mathematical problem here is the lack of “general mathematical “parameter fitting” method(s), which, when applied to proteins, could provide (an effective version of) the total inter-residue interaction energies” [10]. In this paper, we introduce a probabilistic model based on a certain parametrization of free energy that we expect could be fruitful both for predicting protein dihedral angles and investigating the structure of the energy landscape. This model is based on the idea that free energy is largely determined by pairwise interactions of amino acids that are located near each other on a protein sequence. Though this approach is far from reality for general proteins, we expect it to approximate an all-alpha proteins energy landscape.
energy landscape , pattern-based CRFs , protein conformation prediction , sequence labeling , structural SVM
Text of the article Перейти на текст статьи
Fundamental Mathematics Department, Eurasian National University, 2 Satpayev Str., Nur-Sultan, Kazakhstan
Mathematics Department, Nazarbayev University, 53 Kabanbay Batyr Ave, Nur-Sultan, Kazakhstan
Fundamental Mathematics Department
Mathematics Department
10 лет помогаем публиковать статьи Международный издатель
Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026