\n \n \n
\n
\n\n \n \n \n \n \n \n Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective.\n \n \n \n \n\n\n \n Nguyen, V.; Masrani, V.; Brekelmans, R.; Osborne, M.; and Wood, F.\n\n\n \n\n\n\n In of
Advances in Neural Information Processing Systems (NeurIPS), 2020. \n
\n\n
\n\n
\n\n
\n\n \n \n link\n \n \n \n paper\n \n \n \n arxiv\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n \n \n 6 downloads\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@InProceedings{nguyen2020gaussian,\n title={Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective},\n author={Nguyen, Vu and Masrani, Vaden and Brekelmans, Rob and Osborne, Michael and Wood, Frank},\n series={Advances in Neural Information Processing Systems (NeurIPS)},\n year={2020},\n url_Link = {https://proceedings.neurips.cc/paper/2020/hash/3f2dff7862a70f97a59a1fa02c3ec110-Abstract.html}, \n url_Paper = {https://proceedings.neurips.cc/paper/2020/file/3f2dff7862a70f97a59a1fa02c3ec110-Paper.pdf}, \n url_ArXiv={https://arxiv.org/abs/2010.15750},\n support = {D3M},\n abstract={Achieving the full promise of the Thermodynamic Variational Objective (TVO), a recently proposed variational lower bound on the log evidence involving a one-dimensional Riemann integral approximation, requires choosing a "schedule" of sorted discretization points. This paper introduces a bespoke Gaussian process bandit optimization method for automatically choosing these points. Our approach not only automates their one-time selection, but also dynamically adapts their positions over the course of optimization, leading to improved model learning and inference. We provide theoretical guarantees that our bandit optimization converges to the regret-minimizing choice of integration points. Empirical validation of our algorithm is provided in terms of improved learning and inference in Variational Autoencoders and Sigmoid Belief Networks.}\n}\n\n
\n
\n\n\n
\n Achieving the full promise of the Thermodynamic Variational Objective (TVO), a recently proposed variational lower bound on the log evidence involving a one-dimensional Riemann integral approximation, requires choosing a \"schedule\" of sorted discretization points. This paper introduces a bespoke Gaussian process bandit optimization method for automatically choosing these points. Our approach not only automates their one-time selection, but also dynamically adapts their positions over the course of optimization, leading to improved model learning and inference. We provide theoretical guarantees that our bandit optimization converges to the regret-minimizing choice of integration points. Empirical validation of our algorithm is provided in terms of improved learning and inference in Variational Autoencoders and Sigmoid Belief Networks.\n
\n\n\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Semi-supervised Sequential Generative Models.\n \n \n \n \n\n\n \n Teng, M.; Le, T. A.; Scibior, A.; and Wood, F.\n\n\n \n\n\n\n In
Conference on Uncertainty in Artificial Intelligence (UAI), 2020. \n
\n\n
\n\n
\n\n
\n\n \n \n link\n \n \n \n paper\n \n \n \n arxiv\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n\n \n \n \n 5 downloads\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{TEN-20,\n title={Semi-supervised Sequential Generative Models},\n author={Teng, Michael and Le, Tuan Anh and Scibior, Adam and Wood, Frank},\n booktitle={Conference on Uncertainty in Artificial Intelligence (UAI)},\n eid = {arXiv:2007.00155},\n archivePrefix = {arXiv},\n eprint = {2007.00155},\n url_Link = {http://www.auai.org/~w-auai/uai2020/accepted.php},\n url_Paper={http://www.auai.org/uai2020/proceedings/272_main_paper.pdf},\n url_ArXiv = {https://arxiv.org/abs/2007.00155},\n support = {D3M},\n year={2020}\n}\n\n
\n
\n\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Structured Conditional Continuous Normalizing Flows for Efficient Amortized Inference in Graphical Models.\n \n \n \n \n\n\n \n Weilbach, C.; Beronov, B.; Wood, F.; and Harvey, W.\n\n\n \n\n\n\n In
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (AISTATS), pages 4441–4451, 2020. \n
\n\n
PMLR 108:4441-4451\n\n
\n\n
\n\n \n \n link\n \n \n \n paper\n \n \n \n poster\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n \n \n 6 downloads\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{WEI-20,\n title={Structured Conditional Continuous Normalizing Flows for Efficient Amortized Inference in Graphical Models},\n author={Weilbach, Christian and Beronov, Boyan and Wood, Frank and Harvey, William},\n booktitle={Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (AISTATS)},\n pages={4441--4451},\n year={2020},\n url_Link={http://proceedings.mlr.press/v108/weilbach20a.html},\n url_Paper={http://proceedings.mlr.press/v108/weilbach20a/weilbach20a.pdf},\n url_Poster={https://github.com/plai-group/bibliography/blob/master/presentations_posters/PROBPROG2020_WEI.pdf},\n support = {D3M},\n bibbase_note = {PMLR 108:4441-4451},\n abstract = {We exploit minimally faithful inversion of graphical model structures to specify sparse continuous normalizing flows (CNFs) for amortized inference. We find that the sparsity of this factorization can be exploited to reduce the numbers of parameters in the neural network, adaptive integration steps of the flow, and consequently FLOPs at both training and inference time without decreasing performance in comparison to unconstrained flows. By expressing the structure inversion as a compilation pass in a probabilistic programming language, we are able to apply it in a novel way to models as complex as convolutional neural networks. Furthermore, we extend the training objective for CNFs in the context of inference amortization to the symmetric Kullback-Leibler divergence, and demonstrate its theoretical and practical advantages.}\n}\n\n
\n
\n\n\n
\n We exploit minimally faithful inversion of graphical model structures to specify sparse continuous normalizing flows (CNFs) for amortized inference. We find that the sparsity of this factorization can be exploited to reduce the numbers of parameters in the neural network, adaptive integration steps of the flow, and consequently FLOPs at both training and inference time without decreasing performance in comparison to unconstrained flows. By expressing the structure inversion as a compilation pass in a probabilistic programming language, we are able to apply it in a novel way to models as complex as convolutional neural networks. Furthermore, we extend the training objective for CNFs in the context of inference amortization to the symmetric Kullback-Leibler divergence, and demonstrate its theoretical and practical advantages.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n All in the Exponential Family: Bregman Duality in Thermodynamic Variational Inference.\n \n \n \n \n\n\n \n Brekelmans, R.; Masrani, V.; Wood, F.; Ver Steeg, G.; and Galstyan, A.\n\n\n \n\n\n\n In
Thirty-seventh International Conference on Machine Learning (ICML 2020), July 2020. \n
\n\n
\n\n
\n\n
\n\n \n \n link\n \n \n \n paper\n \n \n \n arxiv\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n \n \n 5 downloads\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n\n\n\n
\n
@inproceedings{BRE-20,\n author = {{Brekelmans}, Rob and {Masrani}, Vaden and {Wood}, Frank and {Ver Steeg}, Greg and {Galstyan}, Aram},\n title = {All in the Exponential Family: Bregman Duality in Thermodynamic Variational Inference},\n booktitle={Thirty-seventh International Conference on Machine Learning (ICML 2020)},\n keywords = {Computer Science - Machine Learning, Statistics - Machine Learning},\n year = 2020,\n month = jul,\n eid = {arXiv:2007.00642},\n archivePrefix = {arXiv},\n eprint = {2007.00642},\n url_Link = {https://proceedings.icml.cc/book/2020/hash/12311d05c9aa67765703984239511212},\n url_Paper={https://proceedings.icml.cc/static/paper_files/icml/2020/2826-Paper.pdf},\n url_ArXiv={https://arxiv.org/abs/2007.00642},\n support = {D3M},\n abstract={The recently proposed Thermodynamic Variational Objective (TVO) leverages thermodynamic integration to provide a family of variational inference objectives, which both tighten and generalize the ubiquitous Evidence Lower Bound (ELBO). However, the tightness of TVO bounds was not previously known, an expensive grid search was used to choose a "schedule" of intermediate distributions, and model learning suffered with ostensibly tighter bounds. In this work, we propose an exponential family interpretation of the geometric mixture curve underlying the TVO and various path sampling methods, which allows us to characterize the gap in TVO likelihood bounds as a sum of KL divergences. We propose to choose intermediate distributions using equal spacing in the moment parameters of our exponential family, which matches grid search performance and allows the schedule to adaptively update over the course of training. Finally, we derive a doubly reparameterized gradient estimator which improves model learning and allows the TVO to benefit from more refined bounds. To further contextualize our contributions, we provide a unified framework for understanding thermodynamic integration and the TVO using Taylor series remainders.}\n }\n\n%@unpublished{WOO-20,\n% author = {{Wood}, Frank and {Warrington}, Andrew and {Naderiparizi}, Saeid and {Weilbach}, Christian and {Masrani}, Vaden and {Harvey}, William and {Scibior}, Adam and {Beronov}, Boyan and {Nasseri}, Ali},\n% title = {Planning as Inference in Epidemiological Models},\n% journal = {arXiv e-prints},\n% keywords = {Quantitative Biology - Populations and Evolution, Computer Science - Machine Learning, Statistics - Machine Learning},\n% year = {2020},\n% eid = {arXiv:2003.13221},\n% archivePrefix = {arXiv},\n% eprint = {2003.13221},\n% support = {D3M,COVID,ETALUMIS},\n% url_ArXiv={https://arxiv.org/abs/2003.13221},\n% url_Paper={https://arxiv.org/pdf/2003.13221.pdf},\n% abstract={In this work we demonstrate how existing software tools can be used to automate parts of infectious disease-control policy-making via performing inference in existing epidemiological dynamics models. The kind of inference tasks undertaken include computing, for planning purposes, the posterior distribution over putatively controllable, via direct policy-making choices, simulation model parameters that give rise to acceptable disease progression outcomes. Neither the full capabilities of such inference automation software tools nor their utility for planning is widely disseminated at the current time. Timely gains in understanding about these tools and how they can be used may lead to more fine-grained and less economically damaging policy prescriptions, particularly during the current COVID-19 pandemic.}\n%}\n\n
\n
\n\n\n
\n The recently proposed Thermodynamic Variational Objective (TVO) leverages thermodynamic integration to provide a family of variational inference objectives, which both tighten and generalize the ubiquitous Evidence Lower Bound (ELBO). However, the tightness of TVO bounds was not previously known, an expensive grid search was used to choose a \"schedule\" of intermediate distributions, and model learning suffered with ostensibly tighter bounds. In this work, we propose an exponential family interpretation of the geometric mixture curve underlying the TVO and various path sampling methods, which allows us to characterize the gap in TVO likelihood bounds as a sum of KL divergences. We propose to choose intermediate distributions using equal spacing in the moment parameters of our exponential family, which matches grid search performance and allows the schedule to adaptively update over the course of training. Finally, we derive a doubly reparameterized gradient estimator which improves model learning and allows the TVO to benefit from more refined bounds. To further contextualize our contributions, we provide a unified framework for understanding thermodynamic integration and the TVO using Taylor series remainders.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Coping With Simulators That Don’t Always Return.\n \n \n \n \n\n\n \n Warrington, A; Naderiparizi, S; and Wood, F\n\n\n \n\n\n\n In
The 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), 2020. \n
\n\n
PMLR 108:1748-1758\n\n
\n\n
\n\n \n \n link\n \n \n \n paper\n \n \n \n poster\n \n \n \n arxiv\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n \n \n 7 downloads\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n\n\n\n
\n
@inproceedings{WAR-20,\n title={Coping With Simulators That Don’t Always Return},\n author={Warrington, A and Naderiparizi, S and Wood, F},\n booktitle={The 23rd International Conference on Artificial Intelligence and Statistics (AISTATS)},\n archiveprefix = {arXiv},\n eprint = {1906.05462},\n year={2020},\n url_Link = {http://proceedings.mlr.press/v108/warrington20a.html},\n url_Paper = {http://proceedings.mlr.press/v108/warrington20a/warrington20a.pdf},\n url_Poster = {https://github.com/plai-group/bibliography/blob/master/presentations_posters/WAR-20.pdf},\n url_ArXiv = {https://arxiv.org/abs/2003.12908},\n keywords = {simulators, smc, autoregressive flow},\n support = {D3M,ETALUMIS},\n bibbase_note={PMLR 108:1748-1758},\n abstract = {Deterministic models are approximations of reality that are easy to interpret and often easier to build than stochastic alternatives. Unfortunately, as nature is capricious, observational data can never be fully explained by deterministic models in practice. Observation and process noise need to be added to adapt deterministic models to behave stochastically, such that they are capable of explaining and extrapolating from noisy data. We investigate and address computational inefficiencies that arise from adding process noise to deterministic simulators that fail to return for certain inputs; a property we describe as "brittle." We show how to train a conditional normalizing flow to propose perturbations such that the simulator succeeds with high probability, increasing computational efficiency.}\n }\n\n
\n
\n\n\n
\n Deterministic models are approximations of reality that are easy to interpret and often easier to build than stochastic alternatives. Unfortunately, as nature is capricious, observational data can never be fully explained by deterministic models in practice. Observation and process noise need to be added to adapt deterministic models to behave stochastically, such that they are capable of explaining and extrapolating from noisy data. We investigate and address computational inefficiencies that arise from adding process noise to deterministic simulators that fail to return for certain inputs; a property we describe as \"brittle.\" We show how to train a conditional normalizing flow to propose perturbations such that the simulator succeeds with high probability, increasing computational efficiency.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Attention for Inference Compilation.\n \n \n \n \n\n\n \n Harvey, W; Munk, A; Baydin, A.; Bergholm, A; and Wood, F\n\n\n \n\n\n\n In
The second International Conference on Probabilistic Programming (PROBPROG), 2020. \n
\n\n
\n\n
\n\n
\n\n \n \n paper\n \n \n \n arxiv\n \n \n \n poster\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n \n \n 10 downloads\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{HAR-20,\n title={Attention for Inference Compilation},\n author={Harvey, W and Munk, A and Baydin, AG and Bergholm, A and Wood, F},\n booktitle={The second International Conference on Probabilistic Programming (PROBPROG)},\n year={2020},\n archiveprefix = {arXiv},\n eprint = {1910.11961},\n support = {D3M,LwLL},\n url_Paper={https://arxiv.org/pdf/1910.11961.pdf},\n url_ArXiv={https://arxiv.org/abs/1910.11961},\n url_Poster={https://github.com/plai-group/bibliography/blob/master/presentations_posters/PROBPROG2020_HAR.pdf},\n abstract = {We present a new approach to automatic amortized inference in universal probabilistic programs which improves performance compared to current methods. Our approach is a variation of inference compilation (IC) which leverages deep neural networks to approximate a posterior distribution over latent variables in a probabilistic program. A challenge with existing IC network architectures is that they can fail to model long-range dependencies between latent variables. To address this, we introduce an attention mechanism that attends to the most salient variables previously sampled in the execution of a probabilistic program. We demonstrate that the addition of attention allows the proposal distributions to better match the true posterior, enhancing inference about latent variables in simulators.},\n}\n\n
\n
\n\n\n
\n We present a new approach to automatic amortized inference in universal probabilistic programs which improves performance compared to current methods. Our approach is a variation of inference compilation (IC) which leverages deep neural networks to approximate a posterior distribution over latent variables in a probabilistic program. A challenge with existing IC network architectures is that they can fail to model long-range dependencies between latent variables. To address this, we introduce an attention mechanism that attends to the most salient variables previously sampled in the execution of a probabilistic program. We demonstrate that the addition of attention allows the proposal distributions to better match the true posterior, enhancing inference about latent variables in simulators.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Deep probabilistic surrogate networks for universal simulator approximation.\n \n \n \n \n\n\n \n Munk, A.; Ścibior, A.; Baydin, A.; Stewart, A; Fernlund, A; Poursartip, A; and Wood, F.\n\n\n \n\n\n\n In
The second International Conference on Probabilistic Programming (PROBPROG), 2020. \n
\n\n
\n\n
\n\n
\n\n \n \n paper\n \n \n \n arxiv\n \n \n \n poster\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n \n \n 5 downloads\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{MUN-20,\n title={Deep probabilistic surrogate networks for universal simulator approximation},\n author={Munk, Andreas and Ścibior, Adam and Baydin, AG and Stewart, A and Fernlund, A and Poursartip, A and Wood, Frank},\n booktitle={The second International Conference on Probabilistic Programming (PROBPROG)},\n year={2020},\n archiveprefix = {arXiv},\n eprint = {1910.11950},\n support = {D3M,ETALUMIS},\n url_Paper={https://arxiv.org/pdf/1910.11950.pdf},\n url_ArXiv={https://arxiv.org/abs/1910.11950},\n url_Poster={https://github.com/plai-group/bibliography/blob/master/presentations_posters/PROBPROG2020_MUN.pdf},\n abstract = {We present a framework for automatically structuring and training fast, approximate, deep neural surrogates of existing stochastic simulators. Unlike traditional approaches to surrogate modeling, our surrogates retain the interpretable structure of the reference simulators. The particular way we achieve this allows us to replace the reference simulator with the surrogate when undertaking amortized inference in the probabilistic programming sense. The fidelity and speed of our surrogates allow for not only faster "forward" stochastic simulation but also for accurate and substantially faster inference. We support these claims via experiments that involve a commercial composite-materials curing simulator. Employing our surrogate modeling technique makes inference an order of magnitude faster, opening up the possibility of doing simulator-based, non-invasive, just-in-time parts quality testing; in this case inferring safety-critical latent internal temperature profiles of composite materials undergoing curing from surface temperature profile measurements.},\n}\n\n
\n
\n\n\n
\n We present a framework for automatically structuring and training fast, approximate, deep neural surrogates of existing stochastic simulators. Unlike traditional approaches to surrogate modeling, our surrogates retain the interpretable structure of the reference simulators. The particular way we achieve this allows us to replace the reference simulator with the surrogate when undertaking amortized inference in the probabilistic programming sense. The fidelity and speed of our surrogates allow for not only faster \"forward\" stochastic simulation but also for accurate and substantially faster inference. We support these claims via experiments that involve a commercial composite-materials curing simulator. Employing our surrogate modeling technique makes inference an order of magnitude faster, opening up the possibility of doing simulator-based, non-invasive, just-in-time parts quality testing; in this case inferring safety-critical latent internal temperature profiles of composite materials undergoing curing from surface temperature profile measurements.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Improved Few-Shot Visual Classification.\n \n \n \n \n\n\n \n Bateni, P.; Goyal, R.; Masrani, V.; Wood, F.; and Sigal, L.\n\n\n \n\n\n\n In
Conference on Computer Vision and Pattern Recognition (CVPR), 2020. \n
\n\n
\n\n
\n\n
\n\n \n \n link\n \n \n \n paper\n \n \n \n arxiv\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n \n \n 10 downloads\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n\n\n\n
\n
@inproceedings{BAT-20,\n author = {{Bateni}, Peyman and {Goyal}, Raghav and {Masrani}, Vaden and {Wood}, Frank and {Sigal}, Leonid},\n title = {Improved Few-Shot Visual Classification},\n booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},\n keywords = {LwLL, Computer Science - Computer Vision and Pattern Recognition},\n year = {2020},\n eid = {arXiv:1912.03432},\n archivePrefix = {arXiv},\n eprint = {1912.03432},\n support = {D3M,LwLL},\n url_Link = {https://openaccess.thecvf.com/content_CVPR_2020/html/Bateni_Improved_Few-Shot_Visual_Classification_CVPR_2020_paper.html},\n url_Paper={http://openaccess.thecvf.com/content_CVPR_2020/papers/Bateni_Improved_Few-Shot_Visual_Classification_CVPR_2020_paper.pdf},\n url_ArXiv={https://arxiv.org/abs/1912.03432},\n abstract={Few-shot learning is a fundamental task in computer vision that carries the promise of alleviating the need for exhaustively labeled data. Most few-shot learning approaches to date have focused on progressively more complex neural feature extractors and classifier adaptation strategies, as well as the refinement of the task definition itself. In this paper, we explore the hypothesis that a simple class-covariance-based distance metric, namely the Mahalanobis distance, adopted into a state of the art few-shot learning approach (CNAPS) can, in and of itself, lead to a significant performance improvement. We also discover that it is possible to learn adaptive feature extractors that allow useful estimation of the high dimensional feature covariances required by this metric from surprisingly few samples. The result of our work is a new "Simple CNAPS" architecture which has up to 9.2% fewer trainable parameters than CNAPS and performs up to 6.1% better than state of the art on the standard few-shot image classification benchmark dataset.}\n}\n\n%@inproceedings{WAN-19,\n% title={Safer End-to-End Autonomous Driving via Conditional Imitation Learning and Command Augmentation},\n% author={Wang, R and Scibior, A and Wood F},\n% booktitle={NeurIPS self-driving car workshop},\n% year={2019},\n% archiveprefix = {arXiv},\n% eprint = {1909.09721},\n% support = {D3M},\n% url_Paper = {https://arxiv.org/pdf/1909.09721.pdf},\n% url_ArXiv={https://arxiv.org/abs/1909.09721},\n% abstract={Imitation learning is a promising approach to end-to-end training of autonomous vehicle controllers. Typically the driving process with such approaches is entirely automatic and black-box, although in practice it is desirable to control the vehicle through high-level commands, such as telling it which way to go at an intersection. In existing work this has been accomplished by the application of a branched neural architecture, since directly providing the command as an additional input to the controller often results in the command being ignored. In this work we overcome this limitation by learning a disentangled probabilistic latent variable model that generates the steering commands. We achieve faithful command-conditional generation without using a branched architecture and demonstrate improved stability of the controller, applying only a variational objective without any domain-specific adjustments. On top of that, we extend our model with an additional latent variable and augment the dataset to train a controller that is robust to unsafe commands, such as asking it to turn into a wall. The main contribution of this work is a recipe for building controllable imitation driving agents that improves upon multiple aspects of the current state of the art relating to robustness and interpretability.}\n%}\n\n
\n
\n\n\n
\n Few-shot learning is a fundamental task in computer vision that carries the promise of alleviating the need for exhaustively labeled data. Most few-shot learning approaches to date have focused on progressively more complex neural feature extractors and classifier adaptation strategies, as well as the refinement of the task definition itself. In this paper, we explore the hypothesis that a simple class-covariance-based distance metric, namely the Mahalanobis distance, adopted into a state of the art few-shot learning approach (CNAPS) can, in and of itself, lead to a significant performance improvement. We also discover that it is possible to learn adaptive feature extractors that allow useful estimation of the high dimensional feature covariances required by this metric from surprisingly few samples. The result of our work is a new \"Simple CNAPS\" architecture which has up to 9.2% fewer trainable parameters than CNAPS and performs up to 6.1% better than state of the art on the standard few-shot image classification benchmark dataset.\n
\n\n\n
\n\n\n\n\n\n