I am a PhD student at UBC working with Mark Schmidt at the intersection of optimization and ML.
Prior to UBC, I did my BS and MS at EPFL, with Martin Jaggi.
I had the chance to intern at the MPI with Philipp Hennig
and at RIKEN with Emtiyaz Khan.
My overall goal is to improve our understanding of the optimization methods for ML models;
how they work, how to debug them when they don’t, and how to improve them.
Many of the tricks of the trade used to train ML models are motivated by intuition and trial-and-error.
While those methods are validated based on whether they work in practice or not,
the empirical validations are often limited to establish that they work, rather than explain why.
My work bridges this gap and identifies key difficulties in training ML models
through a combination of experiments that test our current assumptions
and theory to develop algorithms with provable benefits.
Selected works
Why Adam Outperforms Gradient Descent on Language Models: A Heavy-Tailed Class Imbalance Problem
R. Yadav, F. Kunstner, M. Schmidt, A. Bietti. Optimization for ML Workshop, NeurIPS 2023 [ .bib]
Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking
F. Kunstner, V. S. Portella, M. Schmidt, N. Harvey. NeurIPS 2023 [Code, .bib]
Noise is not the main factor behind the gap between SGD and Adam on transformers, but sign descent might be
F. Kunstner, J. Chen, J. W. Lavington, M. Schmidt. ICLR 2023 [arXiv, OpenReview, code, .bib]
Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent
F. Kunstner, R. Kumar, M. Schmidt. AISTATS 2021 [arXiv, .bib]
BackPACK: Packing more into backprop
F. Dangel, F. Kunstner, P. Hennig. ICLR 2020 [arXiv, code, website, .bib]
All publications: Google scholar
Software utilities
Tex2UTF8:
For
places
that do not support Latex but happily render UTF8
DSDL:
An automated dataset downloader for libsvm datasets