assylbekov-z 1

1. Convergence of the EM algorithm in KL distance for overspecified Gaussian mixtures
2. CONVERGENCE OF THE PARTITION FUNCTION IN THE STATIC WORD EMBEDDING MODEL
3. The Rediscovery Hypothesis: Language Models Need to Meet Linguistics
4. Gradient descent fails to learn high-frequency functions and modular arithmetic
5. Approximation error of Fourier neural networks
1