Publications
Relevant links for papers and preprints. Google Scholar might be more up-to-date.
Why Adam Outperforms Gradient Descent on Language Models: A Heavy-Tailed Class Imbalance Problem
F. Kunstner, R. Yadav, A. Milligan, M. Schmidt, A. Bietti.
2024 arXiv arXiv code .bib
Noise is not the main factor behind the gap between SGD and Adam on transformers, but sign descent might be
F. Kunstner, J. Chen, J. W. Lavington, M. Schmidt.
2023 ICLR arXiv code OpenReview poster .bib
Variance Reduced Model Based Methods: New rates and adaptive step sizes
R. M. Gower, F. Kunstner, M. Schmidt.
2023 NeurIPS OPTML Workshop OpenReview .bib
Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking
F. Kunstner, V. S. Portella, M. Schmidt, N. Harvey.
2023 NeurIPS arXiv code OpenReview proceedings poster .bib
Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent - an Open Problem
R. Le Priol, F. Kunstner, D. Scieur, S. Lacoste-Julien.
2022 arXiv arXiv .bib
Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent
F. Kunstner, R. Kumar, M. Schmidt.
2021 AISTATS arXiv proceedings poster .bib
Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)
S. Vaswani, I. Laradji, F. Kunstner, S.Y. Meng, M. Schmidt, S. Lacoste-Julien.
2020 NeurIPS OPTML Workshop arXiv .bib
BackPACK: Packing more into backprop
F. Dangel, F. Kunstner, P. Hennig.
2020 ICLR arXiv website OpenReview poster .bib
Limitations of the empirical Fisher approximation
F. Kunstner, L. Balles, P. Hennig.
2019 NeurIPS arXiv code proceedings poster .bib
SLANG: fast structured covariance approximations for Bayesian deep learning with natural gradient
A. Mishkin, F. Kunstner, D. Nielsen, M. Schmidt, M.E. Khan.
2018 NeurIPS arXiv code proceedings poster .bib
Collaborators
A. Bietti, A. Milligan, A. Mishkin, D. Nielsen, D. Scieur, F. Bach, F. Dangel, I. Laradji, J. Chen, L. Balles, M.E. Khan, M. Jaggi, M. Schmidt, N. Harvey, P. Hennig, R. M. Gower, R. Kumar, R. Le Priol, R. Yadav, S. Lacoste-Julien, S.Y. Meng, S. Stich, S. Vaswani, V. S. Portella, J. W. Lavington.