Speaker: Varun Jog, Wisconsin Institute for Discovery
Title: Information Theoretic Perspectives on Learning Algorithms
Abstract: In statistical learning theory, generalization error is used to quantify the degree to which a supervised machine learning algorithm may overfit to training data. We overview some recent work [Xu and Raginsky (2017)] that bounds generalization error of empirical risk minimization based on the mutual information I(S;W) between the algorithm input S and the algorithm output W. We leverage these results to derive generalization error bounds for a broad class of iterative algorithms that are characterized by bounded, noisy updates with Markovian structure, such as stochastic gradient Langevin dynamics (SGLD). We describe certain shortcomings of mutual information-based bounds, and propose alternate bounds that employ the Wasserstein metric from optimal transport theory. We compare the Wasserstein metric-based bounds with the mutual information-based bounds and show that for a class of data generating distributions, the former leads to stronger bounds on the generalization error.
Bio: Varun Jog received his B.Tech. degree in Electrical Engineering from IIT Bombay in 2010, and his Ph.D. in Electrical Engineering and Computer Sciences (EECS) from UC Berkeley in 2015. Since 2016, he is an Assistant Professor at the Electrical and Computer Engineering Department and a fellow at the Grainger Institute for Engineering at the University of Wisconsin - Madison. His research interests include information theory, machine learning, and network science. He is a recipient of the Eli Jury award from the EECS Department at UC Berkeley (2015) and the Jack Keil Wolf student paper award at ISIT 2015.