Publications

Relevant links for papers and preprints. Google Scholar might be more up-to-date.

Why Adam Outperforms Gradient Descent on Language Models: A Heavy-Tailed Class Imbalance Problem
F. Kunstner, R. Yadav, A. Milligan, M. Schmidt, A. Bietti.
arXiv 2024 [arXiv, code, workshop version, .bib]
Noise is not the main factor behind the gap between SGD and Adam on transformers, but sign descent might be
F. Kunstner, J. Chen, J. W. Lavington, M. Schmidt.
ICLR 2023 [arXiv, code, OpenReview, workshop version, poster, .bib]
Variance Reduced Model Based Methods: New rates and adaptive step sizes
R. M. Gower, F. Kunstner, M. Schmidt.
NeurIPS OPTML Workshop 2023 [OpenReview, .bib]
Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking
F. Kunstner, V. S. Portella, M. Schmidt, N. Harvey.
NeurIPS 2023 [arXiv, code, OpenReview, proceedings, poster, .bib]
Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent - an Open Problem
R. Le Priol, F. Kunstner, D. Scieur, S. Lacoste-Julien.
arXiv 2022 [arXiv, .bib]
Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent
F. Kunstner, R. Kumar, M. Schmidt.
AISTATS 2021 [arXiv, poster, .bib]
Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)
S. Vaswani, I. Laradji, F. Kunstner, S.Y. Meng, M. Schmidt, S. Lacoste-Julien.
NeurIPS OPTML Workshop 2020 [arXiv, .bib]
BackPACK: Packing more into backprop
F. Dangel, F. Kunstner, P. Hennig.
ICLR 2020 [arXiv, code, website, OpenReview, poster, .bib]
Limitations of the empirical Fisher approximation
F. Kunstner, L. Balles, P. Hennig.
NeurIPS 2019 [arXiv, code, proceedings, poster, .bib]
SLANG: fast structured covariance approximations for Bayesian deep learning with natural gradient
A. Mishkin, F. Kunstner, D. Nielsen, M. Schmidt, M.E. Khan.
NeurIPS 2018 [arXiv, code, proceedings, poster, .bib]

Collaborators

A. Bietti, A. Milligan, A. Mishkin, D. Nielsen, D. Scieur, F. Dangel, I. Laradji, J. Chen, L. Balles, M.E. Khan, M. Jaggi, M. Schmidt, N. Harvey, P. Hennig, R. M. Gower, R. Kumar, R. Le Priol, R. Yadav, S. Lacoste-Julien, S.Y. Meng, S. Stich, S. Vaswani, V. S. Portella, J. W. Lavington.