We derive an entire family of related methods, of which the Gumbel trick is one member, and show that the new methods have superior properties in several settings with minimal additional computational cost.
In particular, for the Gumbel trick to yield computational benefits for discrete graphical models, Gumbel perturbations on all configurations are typically replaced with so-called low-rank perturbations. In , volume 31, Long Beach, California, USA, December 2017.
To avoid poor performance due to local minima, we propose to utilise Lipschitz properties of the optimisation objective to ensure global optimisation success.
The resulting approach is a new flexible method for nonparametric black-box learning.
Empirically, we show that our approach leads to an order of magnitude speedup over the strong non-augmented baselines and a Recurrent Neural Network approach, and that we are able to solve problems of difficulty comparable to the simplest problems on programming competition websites. Abstract: The Gumbel trick is a method to sample from a discrete probability distribution, or to estimate its normalizing partition function.
Abstract: We develop a first line of attack for solving programming competition-style problems from input-output examples using deep learning.Crucially, we demonstrate that the new framework includes new pseudo-point approximation methods that outperform current approaches on regression and classification tasks. Lipschitz optimisation for Lipschitz interpolation. They utilise presupposed Lipschitz properties in order to compute inferences over unobserved function values.Unfortunately, most of these approaches rely on exact knowledge about the input space metric as well as about the Lipschitz constant.Furthermore, existing techniques to estimate the Lipschitz constants from the data are not robust to noise or seem to be ad-hoc and typically are decoupled from the ultimate learning and prediction task.
To overcome these limitations, we propose an approach for optimising parameters of the presupposed metrics by minimising validation set prediction errors.
We illustrate its competitiveness on a set of benchmark problems.