Gradient descent fails to learn high-frequency functions and modular arithmetic


Takhanov R. Tezekbayev M. Pak A. Bolatov A. Assylbekov Z.
April 2025Springer

Machine Learning
2025#114Issue 4

Classes of target functions containing a large number of approximately orthogonal elements are known to be hard to learn by the Statistical Query algorithms. Recently this classical fact re-emerged in a theory of gradient-based optimization of neural networks. In the novel framework, the hardness of a class is usually quantified by the variance of the gradient with respect to a random choice of a target function. A set of functions of the form x→axmodp, where a is taken from Zp, has attracted some attention from deep learning theorists and cryptographers recently. This class can be understood as a subset of p-periodic functions on Z and is tightly connected with a class of high-frequency periodic functions on the real line. We present a mathematical analysis of limitations and challenges associated with using gradient-based learning techniques to train a high-frequency periodic function or modular multiplication from examples. We highlight that the variance of the gradient is negligibly small in both cases when either a frequency or the prime base p is large. This in turn prevents such a learning algorithm from being successful.

Barren plateau , Gradient-based optimization , Hardness of learning , High-frequency periodic functions , Modular multiplication , Statistical query

Text of the article Перейти на текст статьи

Nazarbayev University, Kabanbay Batyr Ave., Astana, 010000, Kazakhstan
Department of Mathematical Sciences, Purdue University Fort Wayne, Fort Wayne, IN, United States

Nazarbayev University
Department of Mathematical Sciences

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026