Gradient descent fails to learn high-frequency functions and modular arithmetic
Takhanov R. Tezekbayev M. Pak A. Bolatov A. Assylbekov Z.
April 2025Springer
Machine Learning
2025#114Issue 4
Classes of target functions containing a large number of approximately orthogonal elements are known to be hard to learn by the Statistical Query algorithms. Recently this classical fact re-emerged in a theory of gradient-based optimization of neural networks. In the novel framework, the hardness of a class is usually quantified by the variance of the gradient with respect to a random choice of a target function. A set of functions of the form x→axmodp, where a is taken from Zp, has attracted some attention from deep learning theorists and cryptographers recently. This class can be understood as a subset of p-periodic functions on Z and is tightly connected with a class of high-frequency periodic functions on the real line. We present a mathematical analysis of limitations and challenges associated with using gradient-based learning techniques to train a high-frequency periodic function or modular multiplication from examples. We highlight that the variance of the gradient is negligibly small in both cases when either a frequency or the prime base p is large. This in turn prevents such a learning algorithm from being successful.
Barren plateau , Gradient-based optimization , Hardness of learning , High-frequency periodic functions , Modular multiplication , Statistical query
Text of the article Перейти на текст статьи
Nazarbayev University, Kabanbay Batyr Ave., Astana, 010000, Kazakhstan
Department of Mathematical Sciences, Purdue University Fort Wayne, Fort Wayne, IN, United States
Nazarbayev University
Department of Mathematical Sciences
10 лет помогаем публиковать статьи Международный издатель
Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026