var bibbase_data = {"data":"\"Loading..\"\n\n
\n\n \n\n \n\n \n \n\n \n\n \n \n\n \n\n \n
\n generated by\n \n \"bibbase.org\"\n\n \n
\n \n\n
\n\n \n\n\n
\n\n Excellent! Next you can\n create a new website with this list, or\n embed it in an existing web page by copying & pasting\n any of the following snippets.\n\n
\n JavaScript\n (easiest)\n
\n \n <script src=\"https://bibbase.org/show?bib=https://raw.githubusercontent.com/plai-group/bibliography/master/group_publications.bib&jsonp=1&theme=dividers&group0=year&group1=type&folding=0&jsonp=1\"></script>\n \n
\n\n PHP\n
\n \n <?php\n $contents = file_get_contents(\"https://bibbase.org/show?bib=https://raw.githubusercontent.com/plai-group/bibliography/master/group_publications.bib&jsonp=1&theme=dividers&group0=year&group1=type&folding=0\");\n print_r($contents);\n ?>\n \n
\n\n iFrame\n (not recommended)\n
\n \n <iframe src=\"https://bibbase.org/show?bib=https://raw.githubusercontent.com/plai-group/bibliography/master/group_publications.bib&jsonp=1&theme=dividers&group0=year&group1=type&folding=0\"></iframe>\n \n
\n\n

\n For more details see the documention.\n

\n
\n
\n\n
\n\n This is a preview! To use this list on your own web site\n or create a new web site from it,\n create a free account. The file will be added\n and you will be able to edit it in the File Manager.\n We will show you instructions once you've created your account.\n
\n\n
\n\n

To the site owner:

\n\n

Action required! Mendeley is changing its\n API. In order to keep using Mendeley with BibBase past April\n 14th, you need to:\n

    \n
  1. renew the authorization for BibBase on Mendeley, and
  2. \n
  3. update the BibBase URL\n in your page the same way you did when you initially set up\n this page.\n
  4. \n
\n

\n\n

\n \n \n Fix it now\n

\n
\n\n
\n\n\n
\n \n \n
\n
\n  \n 2024\n \n \n (2)\n \n \n
\n
\n \n \n
\n
\n  \n inproceedings\n \n \n (1)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n \n A Diffusion-Model of Joint Interactive Navigation.\n \n \n \n \n\n\n \n Niedoba, M.; Lavington, J.; Liu, Y.; Lioutas, V.; Sefas, J.; Liang, X.; Green, D.; Dabiri, S.; Zwartsenberg, B.; Scibior, A.; and Wood, F.\n\n\n \n\n\n\n In Advances in Neural Information Processing Systems, 2024. \n \n\n\n\n
\n\n\n\n \n \n \"A paper\n  \n \n \n \"A pdf\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 4 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@InProceedings{,\n\ttitle={A Diffusion-Model of Joint Interactive Navigation},\n\tauthor={Niedoba, Matthew and Lavington, Jonathan and Liu, Yunpeng and Lioutas, Vasileios and Sefas, Justice and Liang, Xiaoxuan and Green, Dylan and Dabiri, Setareh and Zwartsenberg, Berend and Scibior, Adam and Wood, Frank},\n\tbooktitle={Advances in Neural Information Processing Systems},\n\tyear={2024},\n\turl_Paper={https://proceedings.neurips.cc/paper_files/paper/2023/hash/aeeddfbab4e99763ebac9221732c80dd-Abstract-Conference.html},\n\turl_pdf={https://proceedings.neurips.cc/paper_files/paper/2023/file/aeeddfbab4e99763ebac9221732c80dd-Paper-Conference.pdf},\n\tabstract={Simulation of autonomous vehicle systems requires that simulated traffic participants exhibit diverse and realistic behaviors. The use of prerecorded real-world traffic scenarios in simulation ensures realism but the rarity of safety critical events makes large scale collection of driving scenarios expensive. In this paper, we present DJINN -- a diffusion based method of generating traffic scenarios. Our approach jointly diffuses the trajectories of all agents, conditioned on a flexible set of state observations from the past, present, or future. On popular trajectory forecasting datasets, we report state of the art performance on joint trajectory metrics. In addition, we demonstrate how DJINN flexibly enables direct test-time sampling from a variety of valuable conditional distributions including goal-based sampling, behavior-class sampling, and scenario editing.},\n}\n\n\n\n
\n
\n\n\n
\n Simulation of autonomous vehicle systems requires that simulated traffic participants exhibit diverse and realistic behaviors. The use of prerecorded real-world traffic scenarios in simulation ensures realism but the rarity of safety critical events makes large scale collection of driving scenarios expensive. In this paper, we present DJINN – a diffusion based method of generating traffic scenarios. Our approach jointly diffuses the trajectories of all agents, conditioned on a flexible set of state observations from the past, present, or future. On popular trajectory forecasting datasets, we report state of the art performance on joint trajectory metrics. In addition, we demonstrate how DJINN flexibly enables direct test-time sampling from a variety of valuable conditional distributions including goal-based sampling, behavior-class sampling, and scenario editing.\n
\n\n\n
\n\n\n\n\n\n
\n
\n\n
\n
\n  \n unpublished\n \n \n (3)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n \n On the Challenges and Opportunities in Generative AI.\n \n \n \n \n\n\n \n Manduchi, L.; Pandey, K.; Bamler, R.; Cotterell, R.; Däubener, S.; Fellenz, S.; Fischer, A.; Gärtner, T.; Kirchler, M.; Kloft, M.; Li, Y.; Lippert, C.; Melo, G. d.; Nalisnick, E.; Ommer, B.; Ranganath, R.; Rudolph, M.; Ullrich, K.; Broeck, G. V. d.; Vogt, J. E; Wang, Y.; Wenzel, F.; Wood, F.; Mandt, S.; and Fortuin, V.\n\n\n \n\n\n\n 2024.\n \n\n\n\n
\n\n\n\n \n \n \"On arxiv\n  \n \n \n \"On pdf\n  \n \n\n \n \n doi\n  \n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@unpublished{,\n\tdoi={10.48550/arXiv.2403.00025},\n\turl_ArXiv={https://arxiv.org/abs/2403.00025},\n\turl_pdf={https://arxiv.org/pdf/2403.00025.pdf},\n\tauthor={Manduchi, Laura and Pandey, Kushagra and Bamler, Robert and Cotterell, Ryan and Däubener, Sina and Fellenz, Sophie and Fischer, Asja and Gärtner, Thomas and Kirchler, Matthias and Kloft, Marius and Li, Yingzhen and Lippert, Christoph and Melo, Gerard de and Nalisnick, Eric and Ommer, Björn and Ranganath, Rajesh and Rudolph, Maja  and Ullrich, Karen and Broeck, Guy Van den and Vogt, Julia E and Wang, Yixin and Wenzel, Florian and Wood, Frank and Mandt, Stephan and Fortuin, Vincent},\n\ttitle={On the Challenges and Opportunities in Generative AI},\n\tpublisher={arXiv},\n\tyear={2024},\n\tcopyright={arXiv.org perpetual, non-exclusive licence},\n\tabstract={The field of deep generative modeling has grown rapidly and consistently over the years. With the availability of massive amounts of training data coupled with advances in scalable unsupervised learning paradigms, recent large-scale generative models show tremendous promise in synthesizing high-resolution images and text, as well as structured data such as videos and molecules. However, we argue that current large-scale generative AI models do not sufficiently address several fundamental issues that hinder their widespread adoption across domains. In this work, we aim to identify key unresolved challenges in modern generative AI paradigms that should be tackled to further enhance their capabilities, versatility, and reliability. By identifying these challenges, we aim to provide researchers with valuable insights for exploring fruitful research directions, thereby fostering the development of more robust and accessible generative AI solutions.},\n}\n\n\n\n
\n
\n\n\n
\n The field of deep generative modeling has grown rapidly and consistently over the years. With the availability of massive amounts of training data coupled with advances in scalable unsupervised learning paradigms, recent large-scale generative models show tremendous promise in synthesizing high-resolution images and text, as well as structured data such as videos and molecules. However, we argue that current large-scale generative AI models do not sufficiently address several fundamental issues that hinder their widespread adoption across domains. In this work, we aim to identify key unresolved challenges in modern generative AI paradigms that should be tackled to further enhance their capabilities, versatility, and reliability. By identifying these challenges, we aim to provide researchers with valuable insights for exploring fruitful research directions, thereby fostering the development of more robust and accessible generative AI solutions.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Layerwise Proximal Replay: A Proximal Point Method for Online Continual Learning.\n \n \n \n \n\n\n \n Yoo, J.; Liu, Y.; Wood, F.; and Pleiss, G.\n\n\n \n\n\n\n 2024.\n \n\n\n\n
\n\n\n\n \n \n \"Layerwise arxiv\n  \n \n \n \"Layerwise pdf\n  \n \n\n \n \n doi\n  \n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@unpublished{,\n\tdoi={10.48550/arXiv.2402.09542},\n\turl_ArXiv={https://arxiv.org/abs/2402.09542},\n\turl_pdf={https://arxiv.org/pdf/2402.09542.pdf},\n\tauthor={Yoo, Jason and Liu, Yunpeng and Wood, Frank and Pleiss, Geoff},\n\ttitle={Layerwise Proximal Replay: A Proximal Point Method for Online Continual Learning},\n\tpublisher={arXiv},\n\tyear={2024},\n\tcopyright={arXiv.org perpetual, non-exclusive licence},\n\tabstract={In online continual learning, a neural network incrementally learns from a non-i.i.d. data stream. Nearly all online continual learning methods employ experience replay to simultaneously prevent catastrophic forgetting and underfitting on past data. Our work demonstrates a limitation of this approach: networks trained with experience replay tend to have unstable optimization trajectories, impeding their overall accuracy. Surprisingly, these instabilities persist even when the replay buffer stores all previous training examples, suggesting that this issue is orthogonal to catastrophic forgetting. We minimize these instabilities through a simple modification of the optimization geometry. Our solution, Layerwise Proximal Replay (LPR), balances learning from new and replay data while only allowing for gradual changes in the hidden activation of past data. We demonstrate that LPR consistently improves replay-based online continual learning methods across multiple problem settings, regardless of the amount of available replay memory.},\n}\n\n\n\n
\n
\n\n\n
\n In online continual learning, a neural network incrementally learns from a non-i.i.d. data stream. Nearly all online continual learning methods employ experience replay to simultaneously prevent catastrophic forgetting and underfitting on past data. Our work demonstrates a limitation of this approach: networks trained with experience replay tend to have unstable optimization trajectories, impeding their overall accuracy. Surprisingly, these instabilities persist even when the replay buffer stores all previous training examples, suggesting that this issue is orthogonal to catastrophic forgetting. We minimize these instabilities through a simple modification of the optimization geometry. Our solution, Layerwise Proximal Replay (LPR), balances learning from new and replay data while only allowing for gradual changes in the hidden activation of past data. We demonstrate that LPR consistently improves replay-based online continual learning methods across multiple problem settings, regardless of the amount of available replay memory.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Nearest Neighbour Score Estimators for Diffusion Generative Models.\n \n \n \n \n\n\n \n Niedoba, M.; Green, D.; Naderiparizi, S.; Lioutas, V.; Lavington, J. W.; Liang, X.; Liu, Y.; Zhang, K.; Dabiri, S.; Ścibior, A.; Zwartsenberg, B.; and Wood, F.\n\n\n \n\n\n\n 2024.\n \n\n\n\n
\n\n\n\n \n \n \"Nearest arxiv\n  \n \n \n \"Nearest pdf\n  \n \n\n \n \n doi\n  \n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@unpublished{,\n\tdoi={10.48550/arXiv.2402.08018},\n\turl_ArXiv={https://arxiv.org/abs/2402.08018},\n\turl_pdf={https://arxiv.org/pdf/2402.08018.pdf},\n\tauthor={Niedoba, Matthew and Green, Dylan and Naderiparizi, Saeid and Lioutas, Vasileios and Lavington, Jonathan Wilder and Liang, Xiaoxuan and Liu, Yunpeng and Zhang, Ke and Dabiri, Setareh and Ścibior, Adam and Zwartsenberg, Berend and Wood, Frank },\n\ttitle={Nearest Neighbour Score Estimators for Diffusion Generative Models},\n\tpublisher={arXiv},\n\tyear={2024},\n\tcopyright={arXiv.org perpetual, non-exclusive licence},\n\tabstract={Score function estimation is the cornerstone of both training and sampling from diffusion generative models. Despite this fact, the most commonly used estimators are either biased neural network approximations or high variance Monte Carlo estimators based on the conditional score. We introduce a novel nearest neighbour score function estimator which utilizes multiple samples from the training set to dramatically decrease estimator variance. We leverage our low variance estimator in two compelling applications. Training consistency models with our estimator, we report a significant increase in both convergence speed and sample quality. In diffusion models, we show that our estimator can replace a learned network for probability-flow ODE integration, opening promising new avenues of future research.},\n}\n\n\n
\n
\n\n\n
\n Score function estimation is the cornerstone of both training and sampling from diffusion generative models. Despite this fact, the most commonly used estimators are either biased neural network approximations or high variance Monte Carlo estimators based on the conditional score. We introduce a novel nearest neighbour score function estimator which utilizes multiple samples from the training set to dramatically decrease estimator variance. We leverage our low variance estimator in two compelling applications. Training consistency models with our estimator, we report a significant increase in both convergence speed and sample quality. In diffusion models, we show that our estimator can replace a learned network for probability-flow ODE integration, opening promising new avenues of future research.\n
\n\n\n
\n\n\n\n\n\n
\n
\n\n\n\n\n
\n
\n\n
\n
\n  \n 2023\n \n \n (2)\n \n \n
\n
\n \n \n
\n
\n  \n inproceedings\n \n \n (5)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n \n Video Killed the HD-Map: Predicting Multi-Agent Behavior Directly From Aerial Images.\n \n \n \n \n\n\n \n Liu, Y.; Lioutas, V.; Lavington, J. W.; Niedoba, M.; Sefas, J.; Dabiri, S.; Green, D.; Liang, X.; Zwartsenberg, B.; Ścibior, A.; and Wood, F.\n\n\n \n\n\n\n In 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), pages 3261-3267, 2023. \n \n\n\n\n
\n\n\n\n \n \n \"Video paper\n  \n \n \n \"Video pdf\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@InProceedings{10422048,\n\ttitle={Video Killed the HD-Map: Predicting Multi-Agent Behavior Directly From Aerial Images},\n\tauthor={Liu, Yunpeng and Lioutas, Vasileios and Lavington, Jonathan Wilder and Niedoba, Matthew and Sefas, Justice and Dabiri, Setareh and Green, Dylan and Liang, Xiaoxuan and Zwartsenberg, Berend and Ścibior, Adam and Wood, Frank},\n\tbooktitle={2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC)},\n\tpages={3261-3267},\n\tyear={2023},\n\turl_Paper={https://ieeexplore.ieee.org/abstract/document/10422048},\n\turl_pdf={https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10422048},\n\tabstract={The development of algorithms that learn multi-agent behavioral models using human demonstrations has led to increasingly realistic simulations in the field of autonomous driving. In general, such models learn to jointly predict trajectories for all controlled agents by exploiting road context information such as drivable lanes obtained from manually annotated high-definition (HD) maps. Recent studies show that these models can greatly benefit from increasing the amount of human data available for training. However, the manual annotation of HD maps which is necessary for every new location puts a bottleneck on efficiently scaling up human traffic datasets. We propose an aerial image-based map (AIM) representation that requires minimal annotation and provides rich road context information for traffic agents like pedestrians and vehicles. We evaluate multi-agent trajectory prediction using the AIM by incorporating it into a differentiable driving simulator as an image-texture-based differentiable rendering module. Our results demonstrate competitive multi-agent trajectory prediction performance especially for pedestrians in the scene when using our AIM representation as compared to models trained with rasterized HD maps.},\n}\n\n\n
\n
\n\n\n
\n The development of algorithms that learn multi-agent behavioral models using human demonstrations has led to increasingly realistic simulations in the field of autonomous driving. In general, such models learn to jointly predict trajectories for all controlled agents by exploiting road context information such as drivable lanes obtained from manually annotated high-definition (HD) maps. Recent studies show that these models can greatly benefit from increasing the amount of human data available for training. However, the manual annotation of HD maps which is necessary for every new location puts a bottleneck on efficiently scaling up human traffic datasets. We propose an aerial image-based map (AIM) representation that requires minimal annotation and provides rich road context information for traffic agents like pedestrians and vehicles. We evaluate multi-agent trajectory prediction using the AIM by incorporating it into a differentiable driving simulator as an image-texture-based differentiable rendering module. Our results demonstrate competitive multi-agent trajectory prediction performance especially for pedestrians in the scene when using our AIM representation as compared to models trained with rasterized HD maps.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Scaling Graphically Structured Diffusion Models.\n \n \n \n \n\n\n \n Weilbach, C. D.; Harvey, W.; Shirzad, H.; and Wood, F.\n\n\n \n\n\n\n In ICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling, 2023. \n \n\n\n\n
\n\n\n\n \n \n \"Scaling paper\n  \n \n \n \"Scaling pdf\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@InProceedings{\nweilbach2023scaling,\n\ttitle={Scaling Graphically Structured Diffusion Models},\n\tauthor={Weilbach, Christian Dietrich and Harvey, William and Shirzad, Hamed and Wood, Frank},\n\tbooktitle={ICML 2023 Workshop on Structured Probabilistic Inference {\\&} Generative Modeling},\n\tyear={2023},\n\turl_Paper={https://openreview.net/forum?id=pzH65nCyCN},\n\turl_pdf={https://openreview.net/pdf?id=pzH65nCyCN},\n\tabstract={Applications of the recently introduced graphically structured diffusion model (GSDM) family show that sparsifying the transformer attention mechanism within a diffusion model and meta-training on a variety of conditioning tasks can yield an efficiently learnable diffusion model artifact that is capable of flexible, in the sense of observing different subsets of variables at test-time, amortized conditioning in probabilistic graphical models. While extremely promising in terms of applicability and utility, implementations of GSDMs prior to this work were not scalable beyond toy graphical model sizes. We overcome this limitation by describing and and solving two scaling issues related to GSDMs; one engineering and one methodological. We additionally propose a new benchmark problem of weight inference for a convolutional neural network applied to  MNIST.},\n}\n\n\n
\n
\n\n\n
\n Applications of the recently introduced graphically structured diffusion model (GSDM) family show that sparsifying the transformer attention mechanism within a diffusion model and meta-training on a variety of conditioning tasks can yield an efficiently learnable diffusion model artifact that is capable of flexible, in the sense of observing different subsets of variables at test-time, amortized conditioning in probabilistic graphical models. While extremely promising in terms of applicability and utility, implementations of GSDMs prior to this work were not scalable beyond toy graphical model sizes. We overcome this limitation by describing and and solving two scaling issues related to GSDMs; one engineering and one methodological. We additionally propose a new benchmark problem of weight inference for a convolutional neural network applied to MNIST.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Uncertain Evidence in Probabilistic Models and Stochastic Simulators.\n \n \n \n \n\n\n \n Munk, A.; Mead, A.; and Wood, F.\n\n\n \n\n\n\n In Proceedings of the 40th International Conference on Machine Learning, PMLR 202:25486-25500, 2023, 2023. \n \n\n\n\n
\n\n\n\n \n \n \"Uncertain paper\n  \n \n \n \"Uncertain pdf\n  \n \n \n \"Uncertain arxiv\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@InProceedings{,\n\ttitle={Uncertain Evidence in Probabilistic Models and Stochastic Simulators},\n\tauthor={Munk, Andreas and Mead, Alexander and Wood, Frank},\n\tbooktitle={Proceedings of the 40th International Conference on Machine Learning, PMLR 202:25486-25500, 2023},\n\tyear={2023},\n\turl_Paper={https://proceedings.mlr.press/v202/munk23a.html},\n\turl_pdf={https://proceedings.mlr.press/v202/munk23a/munk23a.pdf},\n\turl_ArXiv={https://arxiv.org/abs/2210.12236},\n\tabstract={We consider the problem of performing Bayesian inference in probabilistic models where observations are accompanied by uncertainty, referred to as "uncertain evidence." We explore how to interpret uncertain evidence, and by extension the importance of proper interpretation as it pertains to inference about latent variables. We consider a recently-proposed method "distributional evidence" as well as revisit two older methods: Jeffrey's rule and virtual evidence. We devise guidelines on how to account for uncertain evidence and we provide new insights, particularly regarding consistency. To showcase the impact of different interpretations of the same uncertain evidence, we carry out experiments in which one interpretation is defined as "correct." We then compare inference results from each different interpretation illustrating the importance of careful consideration of uncertain evidence.},\n}\n\n\n
\n
\n\n\n
\n We consider the problem of performing Bayesian inference in probabilistic models where observations are accompanied by uncertainty, referred to as \"uncertain evidence.\" We explore how to interpret uncertain evidence, and by extension the importance of proper interpretation as it pertains to inference about latent variables. We consider a recently-proposed method \"distributional evidence\" as well as revisit two older methods: Jeffrey's rule and virtual evidence. We devise guidelines on how to account for uncertain evidence and we provide new insights, particularly regarding consistency. To showcase the impact of different interpretations of the same uncertain evidence, we carry out experiments in which one interpretation is defined as \"correct.\" We then compare inference results from each different interpretation illustrating the importance of careful consideration of uncertain evidence.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Graphically Structured Diffusion Models.\n \n \n \n \n\n\n \n Weilbach, C.; Harvey, W.; and Wood, F.\n\n\n \n\n\n\n In Proceedings of the 40th International Conference on Machine Learning, PMLR 202:36887-36909, 2023. \n \n\n\n\n
\n\n\n\n \n \n \"Graphically paper\n  \n \n \n \"Graphically pdf\n  \n \n \n \"Graphically arxiv\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@InProceedings{,\n\ttitle={Graphically Structured Diffusion Models},\n\tauthor={Weilbach, Christian and Harvey, William and Wood, Frank},\n\tbooktitle={Proceedings of the 40th International Conference on Machine Learning, PMLR 202:36887-36909},\n\tyear={2023},\n\turl_Paper={https://proceedings.mlr.press/v202/weilbach23a.html},\n\turl_pdf={https://proceedings.mlr.press/v202/weilbach23a/weilbach23a.pdf},\n\turl_ArXiv={https://arxiv.org/abs/2210.11633},\n\tabstract={We introduce a framework for automatically defining and learning deep generative models with problem-specific structure. We tackle problem domains that are more traditionally solved by algorithms such as sorting, constraint satisfaction for Sudoku, and matrix factorization. Concretely, we train diffusion models with an architecture tailored to the problem specification. This problem specification should contain a graphical model describing relationships between variables, and often benefits from explicit representation of subcomputations. Permutation invariances can also be exploited. Across a diverse set of experiments we improve the scaling relationship between problem dimension and our model's performance, in terms of both training time and final accuracy.},\n}\n\n\n
\n
\n\n\n
\n We introduce a framework for automatically defining and learning deep generative models with problem-specific structure. We tackle problem domains that are more traditionally solved by algorithms such as sorting, constraint satisfaction for Sudoku, and matrix factorization. Concretely, we train diffusion models with an architecture tailored to the problem specification. This problem specification should contain a graphical model describing relationships between variables, and often benefits from explicit representation of subcomputations. Permutation invariances can also be exploited. Across a diverse set of experiments we improve the scaling relationship between problem dimension and our model's performance, in terms of both training time and final accuracy.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Critic Sequential Monte Carlo.\n \n \n \n \n\n\n \n Lioutas, V.; Lavington, J. W.; Sefas, J.; Niedoba, M.; Liu, Y.; Zwartsenberg, B.; Dabiri, S.; Wood, F.; and Scibior, A.\n\n\n \n\n\n\n In The Eleventh International Conference on Learning Representations (ICLR), 2023. \n \n\n\n\n
\n\n\n\n \n \n \"Critic link\n  \n \n \n \"Critic arxiv\n  \n \n \n \"Critic pdf\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 20 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{\nlioutas2023critic,\ntitle={{Critic Sequential Monte Carlo}},\nauthor={Lioutas, Vasileios and Lavington, Jonathan Wilder and Sefas, Justice and Niedoba, Matthew and Liu, Yunpeng and Zwartsenberg, Berend and Dabiri, Setareh and Wood, Frank and Scibior, Adam},\nbooktitle={The Eleventh International Conference on Learning Representations (ICLR)},\nyear={2023},\nurl_Link={https://openreview.net/forum?id=ObtGcyKmwna},\nurl_ArXiv = {https://arxiv.org/abs/2205.15460}, \nurl_pdf = {https://arxiv.org/pdf/2205.15460.pdf},\nabstract = {We introduce CriticSMC, a new algorithm for planning as inference built from a composition of sequential Monte Carlo with learned Soft-Q function heuristic factors. These heuristic factors, obtained from parametric approximations of the marginal likelihood ahead, more effectively guide SMC towards the desired target distribution, which is particularly helpful for planning in environments with hard constraints placed sparsely in time. Compared with previous work, we modify the placement of such heuristic factors, which allows us to cheaply propose and evaluate large numbers of putative action particles, greatly increasing inference and planning efficiency. CriticSMC is compatible with informative priors, whose density function need not be known, and can be used as a model-free control algorithm. Our experiments on collision avoidance in a high-dimensional simulated driving task show that CriticSMC significantly reduces collision rates at a low computational cost while maintaining realism and diversity of driving behaviors across vehicles and environment scenarios.},\n}\n\n\n
\n
\n\n\n
\n We introduce CriticSMC, a new algorithm for planning as inference built from a composition of sequential Monte Carlo with learned Soft-Q function heuristic factors. These heuristic factors, obtained from parametric approximations of the marginal likelihood ahead, more effectively guide SMC towards the desired target distribution, which is particularly helpful for planning in environments with hard constraints placed sparsely in time. Compared with previous work, we modify the placement of such heuristic factors, which allows us to cheaply propose and evaluate large numbers of putative action particles, greatly increasing inference and planning efficiency. CriticSMC is compatible with informative priors, whose density function need not be known, and can be used as a model-free control algorithm. Our experiments on collision avoidance in a high-dimensional simulated driving task show that CriticSMC significantly reduces collision rates at a low computational cost while maintaining realism and diversity of driving behaviors across vehicles and environment scenarios.\n
\n\n\n
\n\n\n\n\n\n
\n
\n\n
\n
\n  \n unpublished\n \n \n (4)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n \n Don't be so negative! Score-based Generative Modeling with Oracle-assisted Guidance.\n \n \n \n \n\n\n \n Naderiparizi, S.; Liang, X.; Zwartsenberg, B.; and Wood, F.\n\n\n \n\n\n\n 2023.\n \n\n\n\n
\n\n\n\n \n \n \"Don't arxiv\n  \n \n \n \"Don't pdf\n  \n \n\n \n \n doi\n  \n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n\n\n\n
\n
@unpublished{,\n\tdoi={10.48550/arXiv.2307.16463},\n\turl_ArXiv={https://arxiv.org/abs/2307.16463},\n\turl_pdf={https://arxiv.org/pdf/2307.16463.pdf},\n\tauthor={Naderiparizi, Saeid and Liang, Xiaoxuan and Zwartsenberg, Berend and Wood, Frank},\n\tkeywords={Computer Vision and Pattern Recognition},\n\ttitle={Don't be so negative! Score-based Generative Modeling with Oracle-assisted Guidance},\n\tpublisher={arXiv},\n\tyear={2023},\n\tcopyright={arXiv.org perpetual, non-exclusive licence},\n\tabstract={The maximum likelihood principle advocates parameter estimation via optimization of the data likelihood function. Models estimated in this way can exhibit a variety of generalization characteristics dictated by, e.g. architecture, parameterization, and optimization bias. This work addresses model learning in a setting where there further exists side-information in the form of an oracle that can label samples as being outside the support of the true data generating distribution. Specifically we develop a new denoising diffusion probabilistic modeling (DDPM) methodology, Gen-neG, that leverages this additional side-information. Our approach builds on generative adversarial networks (GANs) and discriminator guidance in diffusion models to guide the generation process towards the positive support region indicated by the oracle. We empirically establish the utility of Gen-neG in applications including collision avoidance in self-driving simulators and safety-guarded human motion generation.},\n}\n\n
\n
\n\n\n
\n The maximum likelihood principle advocates parameter estimation via optimization of the data likelihood function. Models estimated in this way can exhibit a variety of generalization characteristics dictated by, e.g. architecture, parameterization, and optimization bias. This work addresses model learning in a setting where there further exists side-information in the form of an oracle that can label samples as being outside the support of the true data generating distribution. Specifically we develop a new denoising diffusion probabilistic modeling (DDPM) methodology, Gen-neG, that leverages this additional side-information. Our approach builds on generative adversarial networks (GANs) and discriminator guidance in diffusion models to guide the generation process towards the positive support region indicated by the oracle. We empirically establish the utility of Gen-neG in applications including collision avoidance in self-driving simulators and safety-guarded human motion generation.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Realistically distributing object placements in synthetic training data improves the performance of vision-based object detection models.\n \n \n \n \n\n\n \n Dabiri, S.; Lioutas, V.; Zwartsenberg, B.; Liu, Y.; Niedoba, M.; Liang, X.; Green, D.; Sefas, J.; Lavington, J. W.; Wood, F.; and Ścibior, A.\n\n\n \n\n\n\n 2023.\n \n\n\n\n
\n\n\n\n \n \n \"Realistically arxiv\n  \n \n \n \"Realistically pdf\n  \n \n\n \n \n doi\n  \n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@unpublished{,\n\tdoi={10.48550/arXiv.2305.14621},\n\turl_ArXiv={https://arxiv.org/abs/2305.14621},\n\turl_pdf={https://arxiv.org/pdf/2305.14621.pdf},\n\tauthor={Dabiri, Setareh and Lioutas, Vasileios and Zwartsenberg, Berend and Liu, Yunpeng and Niedoba, Matthew and Liang, Xiaoxuan and Green, Dylan and Sefas, Justice and Lavington, Jonathan Wilder and Wood, Frank and Ścibior, Adam},\n\ttitle={Realistically distributing object placements in synthetic training data improves the performance of vision-based object detection models},\n\tpublisher={arXiv},\n\tyear={2023},\n\tcopyright={arXiv.org perpetual, non-exclusive licence},\n\tabstract={When training object detection models on synthetic data, it is important to make the distribution of synthetic data as close as possible to the distribution of real data. We investigate specifically the impact of object placement distribution, keeping all other aspects of synthetic data fixed. Our experiment, training a 3D vehicle detection model in CARLA and testing on KITTI, demonstrates a substantial improvement resulting from improving the object placement distribution.},\n}\n\n
\n
\n\n\n
\n When training object detection models on synthetic data, it is important to make the distribution of synthetic data as close as possible to the distribution of real data. We investigate specifically the impact of object placement distribution, keeping all other aspects of synthetic data fixed. Our experiment, training a 3D vehicle detection model in CARLA and testing on KITTI, demonstrates a substantial improvement resulting from improving the object placement distribution.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Video Killed the HD-Map: Predicting Multi-Agent Behavior Directly From Aerial Images.\n \n \n \n \n\n\n \n Liu, Y.; Lioutas, V.; Lavington, J. W.; Niedoba, M.; Sefas, J.; Dabiri, S.; Green, D.; Liang, X.; Zwartsenberg, B.; Ścibior, A.; and Wood, F.\n\n\n \n\n\n\n 2023.\n \n\n\n\n
\n\n\n\n \n \n \"Video arxiv\n  \n \n \n \"Video pdf\n  \n \n\n \n \n doi\n  \n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@unpublished{,\n\tdoi={10.48550/arXiv.2305.11856},\n\turl_ArXiv={https://arxiv.org/abs/2305.11856},\n\turl_pdf={https://arxiv.org/pdf/2305.11856.pdf},\n\tauthor={Liu, Yunpeng and Lioutas, Vasileios and Lavington, Jonathan Wilder and Niedoba, Matthew and Sefas, Justice and Dabiri, Setareh and Green, Dylan and Liang, Xiaoxuan and Zwartsenberg, Berend and Ścibior, Adam and Wood, Frank},\n\ttitle={Video Killed the HD-Map: Predicting Multi-Agent Behavior Directly From Aerial Images},\n\tpublisher={arXiv},\n\tyear={2023},\n\tcopyright={arXiv.org perpetual, non-exclusive licence},\n\tabstract={The development of algorithms that learn multi-agent behavioral models using human demonstrations has led to increasingly realistic simulations in the field of autonomous driving. In general, such models learn to jointly predict trajectories for all controlled agents by exploiting road context information such as drivable lanes obtained from manually annotated high-definition (HD) maps. Recent studies show that these models can greatly benefit from increasing the amount of human data available for training. However, the manual annotation of HD maps which is necessary for every new location puts a bottleneck on efficiently scaling up human traffic datasets. We propose an aerial image-based map (AIM) representation that requires minimal annotation and provides rich road context information for traffic agents like pedestrians and vehicles. We evaluate multi-agent trajectory prediction using the AIM by incorporating it into a differentiable driving simulator as an image-texture-based differentiable rendering module. Our results demonstrate competitive multi-agent trajectory prediction performance especially for pedestrians in the scene when using our AIM representation as compared to models trained with rasterized HD maps.},\n}\n\n\n
\n
\n\n\n
\n The development of algorithms that learn multi-agent behavioral models using human demonstrations has led to increasingly realistic simulations in the field of autonomous driving. In general, such models learn to jointly predict trajectories for all controlled agents by exploiting road context information such as drivable lanes obtained from manually annotated high-definition (HD) maps. Recent studies show that these models can greatly benefit from increasing the amount of human data available for training. However, the manual annotation of HD maps which is necessary for every new location puts a bottleneck on efficiently scaling up human traffic datasets. We propose an aerial image-based map (AIM) representation that requires minimal annotation and provides rich road context information for traffic agents like pedestrians and vehicles. We evaluate multi-agent trajectory prediction using the AIM by incorporating it into a differentiable driving simulator as an image-texture-based differentiable rendering module. Our results demonstrate competitive multi-agent trajectory prediction performance especially for pedestrians in the scene when using our AIM representation as compared to models trained with rasterized HD maps.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Visual Chain-of-Thought Diffusion Models.\n \n \n \n \n\n\n \n Harvey, W.; and Wood, F.\n\n\n \n\n\n\n 2023.\n \n\n\n\n
\n\n\n\n \n \n \"Visual arxiv\n  \n \n \n \"Visual pdf\n  \n \n\n \n \n doi\n  \n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@unpublished{,\n\tdoi={10.48550/arXiv.2303.16187},\n\turl_ArXiv={https://arxiv.org/abs/2303.16187},\n\turl_pdf={https://arxiv.org/pdf/2303.16187v2.pdf},\n\tauthor={Harvey, William and Wood, Frank},\n\ttitle={Visual Chain-of-Thought Diffusion Models},\n\tpublisher={arXiv},\n\tyear={2023},\n\tcopyright={arXiv.org perpetual, non-exclusive licence},\n\tabstract={Recent progress with conditional image diffusion models has been stunning, and this holds true whether we are speaking about models conditioned on a text description, a scene layout, or a sketch. Unconditional image diffusion models are also improving but lag behind, as do diffusion models which are conditioned on lower-dimensional features like class labels. We propose to close the gap between conditional and unconditional models using a two-stage sampling procedure. In the first stage we sample an embedding describing the semantic content of the image. In the second stage we sample the image conditioned on this embedding and then discard the embedding. Doing so lets us leverage the power of conditional diffusion models on the unconditional generation task, which we show improves FID by 25 − 50% compared to standard unconditional generation. },\n}\n\n\n
\n
\n\n\n
\n Recent progress with conditional image diffusion models has been stunning, and this holds true whether we are speaking about models conditioned on a text description, a scene layout, or a sketch. Unconditional image diffusion models are also improving but lag behind, as do diffusion models which are conditioned on lower-dimensional features like class labels. We propose to close the gap between conditional and unconditional models using a two-stage sampling procedure. In the first stage we sample an embedding describing the semantic content of the image. In the second stage we sample the image conditioned on this embedding and then discard the embedding. Doing so lets us leverage the power of conditional diffusion models on the unconditional generation task, which we show improves FID by 25 − 50% compared to standard unconditional generation. \n
\n\n\n
\n\n\n\n\n\n
\n
\n\n\n\n\n
\n
\n\n
\n
\n  \n 2022\n \n \n (3)\n \n \n
\n
\n \n \n
\n
\n  \n article\n \n \n (3)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n \n Probabilistic Programming Languages: Independent Choices and Deterministic Systems.\n \n \n \n \n\n\n \n Poole, D.; and Wood, F.\n\n\n \n\n\n\n ,691–712. 2022.\n \n\n\n\n
\n\n\n\n \n \n \"ProbabilisticPaper\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n\n \n  \n \n 10 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@article{10.1145/3501714.3501753,\ntitle = {Probabilistic Programming Languages: Independent Choices and Deterministic Systems},\nauthor = {Poole, David and Wood, Frank},\nyear = {2022},\nisbn = {9781450395861},\npublisher = {Association for Computing Machinery},\naddress = {New York, NY, USA},\nedition = {1},\nurl = {https://doi.org/10.1145/3501714.3501753},\nbooktitle = {Probabilistic and Causal Inference: The Works of Judea Pearl},\npages = {691–712},\nnumpages = {22}\n}\n\n\n\n\n
\n
\n\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n TITRATED: Learned Human Driving Behavior without Infractions via Amortized Inference.\n \n \n \n \n\n\n \n Lioutas, V.; Scibior, A.; and Wood, F.\n\n\n \n\n\n\n Transactions on Machine Learning Research (TMLR). 2022.\n \n\n\n\n
\n\n\n\n \n \n \"TITRATED: paper\n  \n \n \n \"TITRATED: pdf\n  \n \n \n \"TITRATED: presentation\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 14 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@article{LIO-22,\n\ttitle={{TITRATED}: Learned Human Driving Behavior without Infractions via Amortized Inference},\n\tauthor={Lioutas, Vasileios and Scibior, Adam and Wood, Frank},\n\tjournal={Transactions on Machine Learning Research (TMLR)},\n\tyear={2022},\n\turl_Paper={https://openreview.net/forum?id=M8D5iZsnrO},\n\turl_pdf = {https://openreview.net/pdf?id=M8D5iZsnrO},\n\turl_Presentation={https://www.youtube.com/watch?v=AMeZtzQxhX4},\n\tsupport = {MITACS},\n\tabstract = {Models of human driving behavior have long been used for prediction in autonomous vehicles, but recently have also started being used to create non-playable characters for driving simulations. While such models are in many respects realistic, they tend to suffer from unacceptably high rates of driving infractions, such as collisions or off-road driving, particularly when deployed in map locations with road geometries dissimilar to the training dataset. In this paper we present a novel method for fine-tuning a foundation model of human driving behavior to novel locations where human demonstrations are not available which reduces the incidence of such infractions. The method relies on inference in the foundation model to generate infraction-free trajectories as well as additional penalties applied when fine-tuning the amortized inference behavioral model. We demonstrate this "titration" technique using the ITRA foundation behavior model trained on the INTERACTION dataset when transferring to CARLA map locations. We demonstrate a 76-86% reduction in infraction rate and provide evidence that further gains are possible with more computation or better inference algorithms.},\n}\n\n\n
\n
\n\n\n
\n Models of human driving behavior have long been used for prediction in autonomous vehicles, but recently have also started being used to create non-playable characters for driving simulations. While such models are in many respects realistic, they tend to suffer from unacceptably high rates of driving infractions, such as collisions or off-road driving, particularly when deployed in map locations with road geometries dissimilar to the training dataset. In this paper we present a novel method for fine-tuning a foundation model of human driving behavior to novel locations where human demonstrations are not available which reduces the incidence of such infractions. The method relies on inference in the foundation model to generate infraction-free trajectories as well as additional penalties applied when fine-tuning the amortized inference behavioral model. We demonstrate this \"titration\" technique using the ITRA foundation behavior model trained on the INTERACTION dataset when transferring to CARLA map locations. We demonstrate a 76-86% reduction in infraction rate and provide evidence that further gains are possible with more computation or better inference algorithms.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Planning as Inference in Epidemiological Dynamics Models.\n \n \n \n \n\n\n \n Wood, F.; Warrington, A.; Naderiparizi, S.; Weilbach, C.; Masrani, V.; Harvey, W.; Åšcibior, A.; Beronov, B.; Grefenstette, J.; Campbell, D.; and Nasseri, S. A.\n\n\n \n\n\n\n Frontiers in Artificial Intelligence, 4. 2022.\n \n\n\n\n
\n\n\n\n \n \n \"Planning paper\n  \n \n \n \"Planning arxiv\n  \n \n\n \n \n doi\n  \n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 10 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@article{WOO-22,\n\tAUTHOR={Wood, Frank and Warrington, Andrew and Naderiparizi, Saeid and Weilbach, Christian and Masrani, Vaden and Harvey, William and Åšcibior, Adam and Beronov, Boyan and Grefenstette, John and Campbell, Duncan and Nasseri, S. Ali},   \n\tTITLE={Planning as Inference in Epidemiological Dynamics Models},      \n\tJOURNAL={Frontiers in Artificial Intelligence},      \n\tVOLUME={4},      \n\tYEAR={2022},      \n\tURL_Paper={https://www.frontiersin.org/article/10.3389/frai.2021.550603},       \n\turl_ArXiv={https://arxiv.org/abs/2003.13221},\n\tDOI={10.3389/frai.2021.550603},      \n\tISSN={2624-8212},   \n\tsupport = {D3M,COVID,ETALUMIS},\n  \tABSTRACT={In this work we demonstrate how to automate parts of the infectious disease-control policy-making process via performing inference in existing epidemiological models. The kind of inference tasks undertaken include computing the posterior distribution over controllable, via direct policy-making choices, simulation model parameters that give rise to acceptable disease progression outcomes. Among other things, we illustrate the use of a probabilistic programming language that automates inference in existing simulators. Neither the full capabilities of this tool for automating inference nor its utility for planning is widely disseminated at the current time. Timely gains in understanding about how such simulation-based models and inference automation tools applied in support of policy-making could lead to less economically damaging policy prescriptions, particularly during the current COVID-19 pandemic.}\n}\n\n
\n
\n\n\n
\n In this work we demonstrate how to automate parts of the infectious disease-control policy-making process via performing inference in existing epidemiological models. The kind of inference tasks undertaken include computing the posterior distribution over controllable, via direct policy-making choices, simulation model parameters that give rise to acceptable disease progression outcomes. Among other things, we illustrate the use of a probabilistic programming language that automates inference in existing simulators. Neither the full capabilities of this tool for automating inference nor its utility for planning is widely disseminated at the current time. Timely gains in understanding about how such simulation-based models and inference automation tools applied in support of policy-making could lead to less economically damaging policy prescriptions, particularly during the current COVID-19 pandemic.\n
\n\n\n
\n\n\n\n\n\n
\n
\n\n
\n
\n  \n inproceedings\n \n \n (6)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n \n Near-Optimal Glimpse Sequences for Improved Hard Attention Neural Network Training.\n \n \n \n \n\n\n \n Harvey, W.; Teng, M.; and Wood, F.\n\n\n \n\n\n\n In 2022 International Joint Conference on Neural Networks (IJCNN), pages 1-8, 2022. \n \n\n\n\n
\n\n\n\n \n \n \"Near-Optimal paper\n  \n \n \n \"Near-Optimal pdf\n  \n \n \n \"Near-Optimal arxiv\n  \n \n\n \n \n doi\n  \n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@InProceedings{9892112,\n  title={Near-Optimal Glimpse Sequences for Improved Hard Attention Neural Network Training},   \nauthor={Harvey, William and Teng, Michael and Wood, Frank},\n  booktitle={2022 International Joint Conference on Neural Networks (IJCNN)}, \n  year={2022},\n  volume={},\n  number={},\n  pages={1-8},\nurl_Paper={https://ieeexplore.ieee.org/abstract/document/9892112},\n\turl_pdf={https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9892112},\n\turl_ArXiv={},\n\tabstract={Hard visual attention is a promising approach to reduce the computational burden of modern computer vision methodologies. However, hard attention mechanisms can be difficult and slow to train, which is especially costly for applications like neural architecture search where multiple networks must be trained. We introduce a method to amortise the cost of training by generating an extra supervision signal for a subset of the training data. This supervision is in the form of sequences of ‘good’ locations to attend to for each image. We find that the best method to generate supervision sequences comes from framing hard attention for image classification as a Bayesian optimal experimental design (BOED) problem. From this perspective, the optimal locations to attend to are those which provide the greatest expected reduction in the entropy of the classification distribution. We introduce methodology from the BOED literature to approximate this optimal behaviour and generate ‘near-optimal’ supervision sequences. We then present a hard attention network training objective that makes use of these sequences and show that it allows faster training than prior work. We finally demonstrate the utility of faster hard attention training by incorporating supervision sequences in a neural architecture search, resulting in hard attention architectures which can outperform networks with access to the entire image.},\n  doi={10.1109/IJCNN55064.2022.9892112}}\n\n\n\n
\n
\n\n\n
\n Hard visual attention is a promising approach to reduce the computational burden of modern computer vision methodologies. However, hard attention mechanisms can be difficult and slow to train, which is especially costly for applications like neural architecture search where multiple networks must be trained. We introduce a method to amortise the cost of training by generating an extra supervision signal for a subset of the training data. This supervision is in the form of sequences of ‘good’ locations to attend to for each image. We find that the best method to generate supervision sequences comes from framing hard attention for image classification as a Bayesian optimal experimental design (BOED) problem. From this perspective, the optimal locations to attend to are those which provide the greatest expected reduction in the entropy of the classification distribution. We introduce methodology from the BOED literature to approximate this optimal behaviour and generate ‘near-optimal’ supervision sequences. We then present a hard attention network training objective that makes use of these sequences and show that it allows faster training than prior work. We finally demonstrate the utility of faster hard attention training by incorporating supervision sequences in a neural architecture search, resulting in hard attention architectures which can outperform networks with access to the entire image.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Vehicle Type Specific Waypoint Generation.\n \n \n \n \n\n\n \n Liu, Y.; Lavington, J. W.; Scibior, A.; and Wood, F.\n\n\n \n\n\n\n In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 12225-12230, 2022. \n \n\n\n\n
\n\n\n\n \n \n \"Vehicle paper\n  \n \n \n \"Vehicle arxiv\n  \n \n \n \"Vehicle pdf\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 2 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n \n \n\n\n\n
\n
@InProceedings{,\n\ttitle={Vehicle Type Specific Waypoint Generation},\n\tauthor={Liu, Yunpeng and Lavington, Jonathan Wilder and Scibior, Adam and Wood, Frank},\n\tbooktitle={2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},\n\tyear={2022},\n\tpages={12225-12230},\n\turl_paper={https://ieeexplore.ieee.org/abstract/document/9981421},\n\turl_ArXiv={https://arxiv.org/abs/2208.04987},\n\turl_pdf={https://arxiv.org/pdf/2208.04987.pdf},\n\tkeywords={Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},\n\tcopyright={Creative Commons Attribution Share Alike 4.0 International},\n\tabstract={We develop a generic mechanism for generating vehicle-type specific sequences of waypoints from a probabilistic foundation model of driving behavior. Many foundation behavior models are trained on data that does not include vehicle information, which limits their utility in downstream applications such as planning. Our novel methodology conditionally specializes such a behavior predictive model to a vehicle-type by utilizing byproducts of the reinforcement learning algorithms used to produce vehicle specific controllers. We show how to compose a vehicle specific value function estimate with a generic probabilistic behavior model to generate vehicle-type specific waypoint sequences that are more likely to be physically plausible then their vehicle-agnostic counterparts.},\n}\n\n\n\n
\n
\n\n\n
\n We develop a generic mechanism for generating vehicle-type specific sequences of waypoints from a probabilistic foundation model of driving behavior. Many foundation behavior models are trained on data that does not include vehicle information, which limits their utility in downstream applications such as planning. Our novel methodology conditionally specializes such a behavior predictive model to a vehicle-type by utilizing byproducts of the reinforcement learning algorithms used to produce vehicle specific controllers. We show how to compose a vehicle specific value function estimate with a generic probabilistic behavior model to generate vehicle-type specific waypoint sequences that are more likely to be physically plausible then their vehicle-agnostic counterparts.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Probabilistic Surrogate Networks for Simulators with Unbounded Randomness.\n \n \n \n \n\n\n \n Munk, A.; Zwartsenberg, B.; Scibior, A.; Baydin, A. G.; Stewart, A. L.; Fernlund, G.; Poursartip, A.; and Wood, F.\n\n\n \n\n\n\n In The 38th Conference on Uncertainty in Artificial Intelligence, 2022. \n \n\n\n\n
\n\n\n\n \n \n \"Probabilistic paper\n  \n \n \n \"Probabilistic pdf\n  \n \n \n \"Probabilistic arxiv\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 2 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@InProceedings{\n\tmunk2022probabilistic,\n\ttitle={Probabilistic Surrogate Networks for Simulators with Unbounded Randomness},\n\tauthor={Munk, Andreas and Zwartsenberg, Berend and Scibior, Adam and Baydin, Atilim Gunes and Stewart, Andrew Lawrence and Fernlund, Goran and Poursartip, Anoush and Wood, Frank},\n\tbooktitle={The 38th Conference on Uncertainty in Artificial Intelligence},\n\tyear={2022},\n\turl_Paper={https://openreview.net/forum?id=r2zEpHIiqxc},\n\turl_pdf={https://openreview.net/pdf?id=r2zEpHIiqxc},\n\turl_ArXiv={https://arxiv.org/abs/1910.11950},\n\tabstract={We present a framework for automatically structuring and training fast, approximate, deep neural surrogates of stochastic simulators. Unlike traditional approaches to surrogate modeling, our surrogates retain the interpretable structure and control flow of the reference simulator. Our surrogates target stochastic simulators where the number of random variables itself can be stochastic and potentially unbounded. Our framework further enables an automatic replacement of the reference simulator with the surrogate when undertaking amortized inference. The fidelity and speed of our surrogates allow for both faster stochastic simulation and accurate and substantially faster posterior inference. Using an illustrative yet non-trivial example we show our surrogates' ability to accurately model a probabilistic program with an unbounded number of random variables. We then proceed with an example that shows our surrogates are able to accurately model a complex structure like an unbounded stack in a program synthesis example. We further demonstrate how our surrogate modeling technique makes amortized inference in complex black-box simulators an order of magnitude faster. Specifically, we do simulator-based materials quality testing, inferring safety-critical latent internal temperature profiles of composite materials undergoing curing.},\n}\n\n
\n
\n\n\n
\n We present a framework for automatically structuring and training fast, approximate, deep neural surrogates of stochastic simulators. Unlike traditional approaches to surrogate modeling, our surrogates retain the interpretable structure and control flow of the reference simulator. Our surrogates target stochastic simulators where the number of random variables itself can be stochastic and potentially unbounded. Our framework further enables an automatic replacement of the reference simulator with the surrogate when undertaking amortized inference. The fidelity and speed of our surrogates allow for both faster stochastic simulation and accurate and substantially faster posterior inference. Using an illustrative yet non-trivial example we show our surrogates' ability to accurately model a probabilistic program with an unbounded number of random variables. We then proceed with an example that shows our surrogates are able to accurately model a complex structure like an unbounded stack in a program synthesis example. We further demonstrate how our surrogate modeling technique makes amortized inference in complex black-box simulators an order of magnitude faster. Specifically, we do simulator-based materials quality testing, inferring safety-critical latent internal temperature profiles of composite materials undergoing curing.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Enhancing Few-Shot Image Classification With Unlabelled Examples.\n \n \n \n \n\n\n \n Bateni, P.; Barber, J.; van de Meent, J.; and Wood, F.\n\n\n \n\n\n\n In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 2796-2805, January 2022. \n \n\n\n\n
\n\n\n\n \n \n \"Enhancing arxiv\n  \n \n \n \"Enhancing paper\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 5 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@InProceedings{Bateni_2022_WACV,\n    author    = {Bateni, Peyman and Barber, Jarred and van de Meent, Jan-Willem and Wood, Frank},\n    title     = {Enhancing Few-Shot Image Classification With Unlabelled Examples},\n    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},\n    month     = {January},\n    year      = {2022},\n    pages     = {2796-2805},\n    url_ArXiv = {https://arxiv.org/abs/2006.12245},\n    url_Paper = {https://ieeexplore.ieee.org/document/9706775},\n    support = {D3M,LwLL},\n    abstract={We develop a transductive meta-learning method that uses unlabelled instances to improve few-shot image classification performance. Our approach combines a regularized Mahalanobis-distance-based soft k-means clustering procedure with a modified state of the art neural adaptive feature extractor to achieve improved test-time classification accuracy using unlabelled data. We evaluate our method on transductive few-shot learning tasks, in which the goal is to jointly predict labels for query (test) examples given a set of support (training) examples. We achieve state of the art performance on the Meta-Dataset, mini-ImageNet and tiered-ImageNet benchmarks.}\n}\n\n
\n
\n\n\n
\n We develop a transductive meta-learning method that uses unlabelled instances to improve few-shot image classification performance. Our approach combines a regularized Mahalanobis-distance-based soft k-means clustering procedure with a modified state of the art neural adaptive feature extractor to achieve improved test-time classification accuracy using unlabelled data. We evaluate our method on transductive few-shot learning tasks, in which the goal is to jointly predict labels for query (test) examples given a set of support (training) examples. We achieve state of the art performance on the Meta-Dataset, mini-ImageNet and tiered-ImageNet benchmarks.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Conditional Image Generation by Conditioning Variational Auto-Encoders.\n \n \n \n \n\n\n \n Harvey, W.; Naderiparizi, S.; and Wood, F.\n\n\n \n\n\n\n In International Conference on Learning Representations, 2022. \n \n\n\n\n
\n\n\n\n \n \n \"Conditional paper\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n\n \n  \n \n 5 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{\nharvey2022conditional,\ntitle={Conditional Image Generation by Conditioning Variational Auto-Encoders},\nauthor={William Harvey and Saeid Naderiparizi and Frank Wood},\nbooktitle={International Conference on Learning Representations},\nyear={2022},\nurl_Paper={https://openreview.net/forum?id=7MV6uLzOChW}\n}\n\n\n
\n
\n\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Amortized Rejection Sampling in Universal Probabilistic Programming .\n \n \n \n \n\n\n \n Naderiparizi, S.; Scibior, A.; Munk, A.; Ghadiri, M.; Gunes Baydin, A.; Gram-Hansen, B. J.; Schroeder De Witt, C. A.; Zinkov, R.; Torr, P.; Rainforth, T.; Whye Teh, Y.; and Wood, F.\n\n\n \n\n\n\n In Camps-Valls, G.; Ruiz, F. J. R.; and Valera, I., editor(s), Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151, of Proceedings of Machine Learning Research, pages 8392–8412, 28–30 Mar 2022. PMLR\n \n\n\n\n
\n\n\n\n \n \n \"AmortizedPaper\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 2 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{NAD-22,\n  title = \t { Amortized Rejection Sampling in Universal Probabilistic Programming },\n  author =       {Naderiparizi, Saeid and Scibior, Adam and Munk, Andreas and Ghadiri, Mehrdad and Gunes Baydin, Atilim and Gram-Hansen, Bradley J. and Schroeder De Witt, Christian A. and Zinkov, Robert and Torr, Philip and Rainforth, Tom and Whye Teh, Yee and Wood, Frank},\n  booktitle = \t {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics},\n  pages = \t {8392--8412},\n  year = \t {2022},\n  editor = \t {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel},\n  volume = \t {151},\n  series = \t {Proceedings of Machine Learning Research},\n  month = \t {28--30 Mar},\n  publisher =    {PMLR},\n  pdf = \t {https://proceedings.mlr.press/v151/naderiparizi22a/naderiparizi22a.pdf},\n  url = \t {https://proceedings.mlr.press/v151/naderiparizi22a.html},\n  abstract = \t { Naive approaches to amortized inference in probabilistic programs with unbounded loops can produce estimators with infinite variance. This is particularly true of importance sampling inference in programs that explicitly include rejection sampling as part of the user-programmed generative procedure. In this paper we develop a new and efficient amortized importance sampling estimator. We prove finite variance of our estimator and empirically demonstrate our method’s correctness and efficiency compared to existing alternatives on generative programs containing rejection sampling loops and discuss how to implement our method in a generic probabilistic programming framework. }\n}\n\n
\n
\n\n\n
\n Naive approaches to amortized inference in probabilistic programs with unbounded loops can produce estimators with infinite variance. This is particularly true of importance sampling inference in programs that explicitly include rejection sampling as part of the user-programmed generative procedure. In this paper we develop a new and efficient amortized importance sampling estimator. We prove finite variance of our estimator and empirically demonstrate our method’s correctness and efficiency compared to existing alternatives on generative programs containing rejection sampling loops and discuss how to implement our method in a generic probabilistic programming framework. \n
\n\n\n
\n\n\n\n\n\n
\n
\n\n
\n
\n  \n unpublished\n \n \n (6)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n \n BayesPCN: A Continually Learnable Predictive Coding Associative Memory.\n \n \n \n \n\n\n \n Yoo, J.; and Wood, F.\n\n\n \n\n\n\n 2022.\n \n\n\n\n
\n\n\n\n \n \n \"BayesPCN: arxiv\n  \n \n \n \"BayesPCN: pdf\n  \n \n\n \n \n doi\n  \n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n \n \n \n \n\n\n\n
\n
@unpublished{,\n\tdoi = {10.48550/ARXIV.2205.09930},\n\turl_ArXiv = {https://arxiv.org/abs/2205.09930},\n\turl_pdf = {https://arxiv.org/pdf/2205.09930.pdf},\n\tauthor = {Yoo, Jason and Wood, Frank},\n\tkeywords = {Machine Learning (cs.LG), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},\n\ttitle = {BayesPCN: A Continually Learnable Predictive Coding Associative Memory},\n\tpublisher = {arXiv},\n\tyear = {2022},\n\tcopyright = {arXiv.org perpetual, non-exclusive license},\n\tabstract = {Associative memory plays an important role in human intelligence and its mechanisms have been linked to attention in machine learning. While the machine learning community’s interest in associative memories has recently been rekindled, most work has focused on memory recall (read) over memory learning (write). In this paper, we present BayesPCN, a hierarchical associative memory capable of performing continual one-shot memory writes without meta-learning. Moreover, BayesPCN is able to gradually forget past observations (forget) to free its memory. Experiments show that BayesPCN can recall corrupted i.i.d. high-dimensional data observed hundreds of “timesteps” ago without a significant drop in recall ability compared to the state-of-the-art offline-learned associative memory models.},\n}\n\n\n
\n
\n\n\n
\n Associative memory plays an important role in human intelligence and its mechanisms have been linked to attention in machine learning. While the machine learning community’s interest in associative memories has recently been rekindled, most work has focused on memory recall (read) over memory learning (write). In this paper, we present BayesPCN, a hierarchical associative memory capable of performing continual one-shot memory writes without meta-learning. Moreover, BayesPCN is able to gradually forget past observations (forget) to free its memory. Experiments show that BayesPCN can recall corrupted i.i.d. high-dimensional data observed hundreds of “timesteps” ago without a significant drop in recall ability compared to the state-of-the-art offline-learned associative memory models.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Flexible Diffusion Modeling of Long Videos.\n \n \n \n \n\n\n \n Harvey, W.; Naderiparizi, S.; Masrani, V.; Weilbach, C.; and Wood, F.\n\n\n \n\n\n\n 2022.\n \n\n\n\n
\n\n\n\n \n \n \"Flexible arxiv\n  \n \n \n \"Flexible pdf\n  \n \n\n \n \n doi\n  \n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n \n \n \n \n\n\n\n
\n
@unpublished{,\n\tdoi = {10.48550/ARXIV.2205.11495},\n\turl_ArXiv = {https://arxiv.org/abs/2205.11495},\n\turl_pdf = {https://arxiv.org/pdf/2205.11495.pdf},\n\tauthor = {Harvey, William and Naderiparizi, Saeid and Masrani, Vaden and Weilbach, Christian and Wood, Frank},\n\tkeywords = {Computer Vision and Pattern Recognition (cs.CV), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences},\n\ttitle = {Flexible Diffusion Modeling of Long Videos},\n\tpublisher = {arXiv},\n\tyear = {2022},\n\tcopyright = {arXiv.org perpetual, non-exclusive license},\n\tabstract = {We present a framework for video modeling based on denoising diffusion probabilistic models that produces long-duration video completions in a variety of realistic environments. We introduce a generative model that can at test-time sample any arbitrary subset of video frames conditioned on any other subset and present an architecture adapted for this purpose. Doing so allows us to efficiently compare and optimize a variety of schedules for the order in which frames in a long video are sampled and use selective sparse and long-range conditioning on previously sampled frames. We demonstrate improved video modeling over prior work on a number of datasets and sample temporally coherent videos over 25 minutes in length. We additionally release a new video modeling dataset and semantically meaningful metrics based on videos generated in the CARLA self-driving car simulator.}, \n}\n\n\n
\n
\n\n\n
\n We present a framework for video modeling based on denoising diffusion probabilistic models that produces long-duration video completions in a variety of realistic environments. We introduce a generative model that can at test-time sample any arbitrary subset of video frames conditioned on any other subset and present an architecture adapted for this purpose. Doing so allows us to efficiently compare and optimize a variety of schedules for the order in which frames in a long video are sampled and use selective sparse and long-range conditioning on previously sampled frames. We demonstrate improved video modeling over prior work on a number of datasets and sample temporally coherent videos over 25 minutes in length. We additionally release a new video modeling dataset and semantically meaningful metrics based on videos generated in the CARLA self-driving car simulator.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Conditional Permutation Invariant Flows.\n \n \n \n \n\n\n \n Zwartsenberg, B.; Ścibior, A.; Niedoba, M.; Lioutas, V.; Liu, Y.; Sefas, J.; Dabiri, S.; Lavington, J. W.; Campbell, T.; and Wood, F.\n\n\n \n\n\n\n 2022.\n \n\n\n\n
\n\n\n\n \n \n \"Conditional arxiv\n  \n \n\n \n \n doi\n  \n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n
\n
@unpublished{,\n\tdoi = {10.48550/ARXIV.2206.09021},\n\turl_ArXiv = {https://arxiv.org/abs/2206.09021},\n\tauthor = {Zwartsenberg, Berend and Ścibior, Adam and Niedoba, Matthew and Lioutas, Vasileios and Liu, Yunpeng and Sefas, Justice and Dabiri, Setareh and Lavington, Jonathan Wilder and Campbell, Trevor and Wood, Frank},\n\tkeywords = {Machine Learning (stat.ML), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences, I.2.0},\n\ttitle = {Conditional Permutation Invariant Flows},\n\tpublisher = {arXiv},\n\tyear = {2022},\n\tcopyright = {Creative Commons Attribution Non Commercial No Derivatives 4.0 International}, \n\tabstract = {We present a novel, conditional generative probabilistic model of set-valued data with a tractable log density. This model is a continuous normalizing flow governed by permutation equivariant dynamics. These dynamics are driven by a learnable per-set-element term and pairwise interactions, both parametrized by deep neural networks. We illustrate the utility of this model via applications including (1) complex traffic scene generation conditioned on visually specified map information, and (2) object bounding box generation conditioned directly on images. We train our model by maximizing the expected likelihood of labeled conditional data under our flow, with the aid of a penalty that ensures the dynamics are smooth and hence efficiently solvable. Our method significantly outperforms non-permutation invariant baselines in terms of log likelihood and domain-specific metrics (offroad, collision, and combined infractions), yielding realistic samples that are difficult to distinguish from real data.},\n}\n\n\n
\n
\n\n\n
\n We present a novel, conditional generative probabilistic model of set-valued data with a tractable log density. This model is a continuous normalizing flow governed by permutation equivariant dynamics. These dynamics are driven by a learnable per-set-element term and pairwise interactions, both parametrized by deep neural networks. We illustrate the utility of this model via applications including (1) complex traffic scene generation conditioned on visually specified map information, and (2) object bounding box generation conditioned directly on images. We train our model by maximizing the expected likelihood of labeled conditional data under our flow, with the aid of a penalty that ensures the dynamics are smooth and hence efficiently solvable. Our method significantly outperforms non-permutation invariant baselines in terms of log likelihood and domain-specific metrics (offroad, collision, and combined infractions), yielding realistic samples that are difficult to distinguish from real data.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Gradients without Backpropagation.\n \n \n \n \n\n\n \n Baydin, A. G.; Pearlmutter, B. A.; Syme, D.; Wood, F.; and Torr, P.\n\n\n \n\n\n\n 2022.\n \n\n\n\n
\n\n\n\n \n \n \"Gradients arxiv\n  \n \n\n \n \n doi\n  \n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 5 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n
\n
@unpublished{BAY-22,\n  doi = {10.48550/ARXIV.2202.08587},\n  url_ArXiv = {https://arxiv.org/abs/2202.08587},\n  author = {Baydin, Atılım Güneş and Pearlmutter, Barak A. and Syme, Don and Wood, Frank and Torr, Philip},\n  keywords = {Machine Learning (cs.LG), Machine Learning (stat.ML), FOS: Computer and information sciences, FOS: Computer and information sciences, I.2.6; I.2.5, 68T07},\n  title = {Gradients without Backpropagation},\n  publisher = {arXiv},\n  year = {2022}, \n  copyright = {arXiv.org perpetual, non-exclusive license},\n  abstract={Using backpropagation to compute gradients of objective functions for optimization has remained a mainstay of machine learning. Backpropagation, or reverse-mode differentiation, is a special case within the general family of automatic differentiation algorithms that also includes the forward mode. We present a method to compute gradients based solely on the directional derivative that one can compute exactly and efficiently via the forward mode. We call this formulation the forward gradient, an unbiased estimate of the gradient that can be evaluated in a single forward run of the function, entirely eliminating the need for backpropagation in gradient descent. We demonstrate forward gradient descent in a range of problems, showing substantial savings in computation and enabling training up to twice as fast in some cases.}\n}\n\n
\n
\n\n\n
\n Using backpropagation to compute gradients of objective functions for optimization has remained a mainstay of machine learning. Backpropagation, or reverse-mode differentiation, is a special case within the general family of automatic differentiation algorithms that also includes the forward mode. We present a method to compute gradients based solely on the directional derivative that one can compute exactly and efficiently via the forward mode. We call this formulation the forward gradient, an unbiased estimate of the gradient that can be evaluated in a single forward run of the function, entirely eliminating the need for backpropagation in gradient descent. We demonstrate forward gradient descent in a range of problems, showing substantial savings in computation and enabling training up to twice as fast in some cases.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Exploration with Multi-Sample Target Values for Distributional Reinforcement Learning.\n \n \n \n \n\n\n \n Teng, M.; van de Panne, M.; and Wood, F.\n\n\n \n\n\n\n 2022.\n \n\n\n\n
\n\n\n\n \n \n \"Exploration arxiv\n  \n \n\n \n \n doi\n  \n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n \n \n \n \n\n\n\n
\n
@unpublished{TEN-22,\n  doi = {10.48550/ARXIV.2202.02693},\n  url_ArXiv = {https://arxiv.org/abs/2202.02693},\n  author = {Teng, Michael and van de Panne, Michiel and Wood, Frank},\n  keywords = {Machine Learning (cs.LG), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},\n  title = {Exploration with Multi-Sample Target Values for Distributional Reinforcement Learning},\n  publisher = {arXiv},\n  year = {2022},  \n  copyright = {arXiv.org perpetual, non-exclusive license},\n  abstract = {Distributional reinforcement learning (RL) aims to learn a value-network that predicts the full distribution of the returns for a given state, often modeled via a quantile-based critic. This approach has been successfully integrated into common RL methods for continuous control, giving rise to algorithms such as Distributional Soft Actor-Critic (DSAC). In this paper, we introduce multi-sample target values (MTV) for distributional RL, as a principled replacement for single-sample target value estimation, as commonly employed in current practice. The improved distributional estimates further lend themselves to UCB-based exploration. These two ideas are combined to yield our distributional RL algorithm, E2DC (Extra Exploration with Distributional Critics). We evaluate our approach on a range of continuous control tasks and demonstrate state-of-the-art model-free performance on difficult tasks such as Humanoid control. We provide further insight into the method via visualization and analysis of the learned distributions and their evolution during training.}\n}\n\n
\n
\n\n\n
\n Distributional reinforcement learning (RL) aims to learn a value-network that predicts the full distribution of the returns for a given state, often modeled via a quantile-based critic. This approach has been successfully integrated into common RL methods for continuous control, giving rise to algorithms such as Distributional Soft Actor-Critic (DSAC). In this paper, we introduce multi-sample target values (MTV) for distributional RL, as a principled replacement for single-sample target value estimation, as commonly employed in current practice. The improved distributional estimates further lend themselves to UCB-based exploration. These two ideas are combined to yield our distributional RL algorithm, E2DC (Extra Exploration with Distributional Critics). We evaluate our approach on a range of continuous control tasks and demonstrate state-of-the-art model-free performance on difficult tasks such as Humanoid control. We provide further insight into the method via visualization and analysis of the learned distributions and their evolution during training.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Beyond Simple Meta-Learning: Multi-Purpose Models for Multi-Domain, Active and Continual Few-Shot Learning.\n \n \n \n \n\n\n \n Bateni, P.; Barber, J.; Goyal, R.; Masrani, V.; van de Meent, J.; Sigal, L.; and Wood, F.\n\n\n \n\n\n\n 2022.\n \n\n\n\n
\n\n\n\n \n \n \"Beyond arxiv\n  \n \n\n \n \n doi\n  \n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n \n \n\n\n\n
\n
@unpublished{BAT-22,\n  doi = {10.48550/ARXIV.2201.05151},\n  url_ArXiv = {https://arxiv.org/abs/2201.05151},\n  author = {Bateni, Peyman and Barber, Jarred and Goyal, Raghav and Masrani, Vaden and van de Meent, Jan-Willem and Sigal, Leonid and Wood, Frank},\n  keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},\n  title = {Beyond Simple Meta-Learning: Multi-Purpose Models for Multi-Domain, Active and Continual Few-Shot Learning},\n  publisher = {arXiv},\n  year = {2022},  \n  copyright = {Creative Commons Attribution 4.0 International},\n  abstract={Modern deep learning requires large-scale extensively labelled datasets for training. Few-shot learning aims to alleviate this issue by learning effectively from few labelled examples. In previously proposed few-shot visual classifiers, it is assumed that the feature manifold, where classifier decisions are made, has uncorrelated feature dimensions and uniform feature variance. In this work, we focus on addressing the limitations arising from this assumption by proposing a variance-sensitive class of models that operates in a low-label regime. The first method, Simple CNAPS, employs a hierarchically regularized Mahalanobis-distance based classifier combined with a state of the art neural adaptive feature extractor to achieve strong performance on Meta-Dataset, mini-ImageNet and tiered-ImageNet benchmarks. We further extend this approach to a transductive learning setting, proposing Transductive CNAPS. This transductive method combines a soft k-means parameter refinement procedure with a two-step task encoder to achieve improved test-time classification accuracy using unlabelled data. Transductive CNAPS achieves state of the art performance on Meta-Dataset. Finally, we explore the use of our methods (Simple and Transductive) for "out of the box" continual and active learning. Extensive experiments on large scale benchmarks illustrate robustness and versatility of this, relatively speaking, simple class of models. All trained model checkpoints and corresponding source codes have been made publicly available.}\n}\n\n
\n
\n\n\n
\n Modern deep learning requires large-scale extensively labelled datasets for training. Few-shot learning aims to alleviate this issue by learning effectively from few labelled examples. In previously proposed few-shot visual classifiers, it is assumed that the feature manifold, where classifier decisions are made, has uncorrelated feature dimensions and uniform feature variance. In this work, we focus on addressing the limitations arising from this assumption by proposing a variance-sensitive class of models that operates in a low-label regime. The first method, Simple CNAPS, employs a hierarchically regularized Mahalanobis-distance based classifier combined with a state of the art neural adaptive feature extractor to achieve strong performance on Meta-Dataset, mini-ImageNet and tiered-ImageNet benchmarks. We further extend this approach to a transductive learning setting, proposing Transductive CNAPS. This transductive method combines a soft k-means parameter refinement procedure with a two-step task encoder to achieve improved test-time classification accuracy using unlabelled data. Transductive CNAPS achieves state of the art performance on Meta-Dataset. Finally, we explore the use of our methods (Simple and Transductive) for \"out of the box\" continual and active learning. Extensive experiments on large scale benchmarks illustrate robustness and versatility of this, relatively speaking, simple class of models. All trained model checkpoints and corresponding source codes have been made publicly available.\n
\n\n\n
\n\n\n\n\n\n
\n
\n\n\n\n\n
\n
\n\n
\n
\n  \n 2021\n \n \n (3)\n \n \n
\n
\n \n \n
\n
\n  \n inproceedings\n \n \n (7)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n \n Assisting the Adversary to Improve GAN Training.\n \n \n \n \n\n\n \n Munk, A.; Harvey, W.; and Wood, F.\n\n\n \n\n\n\n In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1-8, July 2021. \n \n\n\n\n
\n\n\n\n \n \n \"Assisting arxiv\n  \n \n \n \"Assisting paper\n  \n \n\n \n \n doi\n  \n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 3 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@InProceedings{9533449,  \n\tauthor={Munk, Andreas and Harvey, William and Wood, Frank},  \n\tbooktitle={2021 International Joint Conference on Neural Networks (IJCNN)},   \n\ttitle={Assisting the Adversary to Improve GAN Training},   \n\tyear={2021},\n\tpages={1-8},  \n\tabstract={Some of the most popular methods for improving the stability and performance of GANs involve constraining or regularizing the discriminator. In this paper we consider a largely overlooked regularization technique which we refer to as the Adversary's Assistant (AdvAs). We motivate this using a different perspective to that of prior work. Specifically, we consider a common mismatch between theoretical analysis and practice: analysis often assumes that the discriminator reaches its optimum on each iteration. In practice, this is essentially never true, often leading to poor gradient estimates for the generator. To address this, AdvAs is a penalty imposed on the generator based on the norm of the gradients used to train the discriminator. This encourages the generator to move towards points where the discriminator is optimal. We demonstrate the effect of applying AdvAs to several GAN objectives, datasets and network architectures. The results indicate a reduction in the mismatch between theory and practice and that AdvAs can lead to improvement of GAN training, as measured by FID scores.},  \n\tdoi={10.1109/IJCNN52387.2021.9533449},  \n\tISSN={2161-4407},  \n\tmonth={July},\n\turl_ArXiv = {https://arxiv.org/abs/2010.01274},\n\turl_Paper = {https://ieeexplore.ieee.org/document/9533449},\n\tsupport = {D3M,ETALUMIS}\n}\n\n
\n
\n\n\n
\n Some of the most popular methods for improving the stability and performance of GANs involve constraining or regularizing the discriminator. In this paper we consider a largely overlooked regularization technique which we refer to as the Adversary's Assistant (AdvAs). We motivate this using a different perspective to that of prior work. Specifically, we consider a common mismatch between theoretical analysis and practice: analysis often assumes that the discriminator reaches its optimum on each iteration. In practice, this is essentially never true, often leading to poor gradient estimates for the generator. To address this, AdvAs is a penalty imposed on the generator based on the norm of the gradients used to train the discriminator. This encourages the generator to move towards points where the discriminator is optimal. We demonstrate the effect of applying AdvAs to several GAN objectives, datasets and network architectures. The results indicate a reduction in the mismatch between theory and practice and that AdvAs can lead to improvement of GAN training, as measured by FID scores.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n q-Paths: Generalizing the geometric annealing path using power means.\n \n \n \n \n\n\n \n Masrani, V.; Brekelmans, R.; Bui, T.; Nielsen, F.; Galstyan, A.; Ver Steeg, G.; and Wood, F.\n\n\n \n\n\n\n In de Campos, C.; and Maathuis, M. H., editor(s), Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, volume 161, of Proceedings of Machine Learning Research, pages 1938–1947, 27–30 Jul 2021. PMLR\n \n\n\n\n
\n\n\n\n \n \n \"q-Paths: pdf\n  \n \n \n \"q-Paths: paper\n  \n \n \n \"q-Paths: arxiv\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@InProceedings{pmlr-v161-masrani21a,\n  title = \t {q-Paths: Generalizing the geometric annealing path using power means},\n  author =       {Masrani, Vaden and Brekelmans, Rob and Bui, Thang and Nielsen, Frank and Galstyan, Aram and Ver Steeg, Greg and Wood, Frank},\n  booktitle = \t {Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence},\n  pages = \t {1938--1947},\n  year = \t {2021},\n  editor = \t {de Campos, Cassio and Maathuis, Marloes H.},\n  volume = \t {161},\n  series = \t {Proceedings of Machine Learning Research},\n  month = \t {27--30 Jul},\n  publisher =    {PMLR},\n  url_pdf = \t {https://proceedings.mlr.press/v161/masrani21a/masrani21a.pdf},\n  url_Paper = \t {https://proceedings.mlr.press/v161/masrani21a.html},\n  url_ArXiv=     {https://arxiv.org/abs/2107.00745},\n  abstract = \t {Many common machine learning methods involve the geometric annealing path, a sequence of intermediate densities between two distributions of interest constructed using the geometric average. While alternatives such as the moment-averaging path have demonstrated performance gains in some settings, their practical applicability remains limited by exponential family endpoint assumptions and a lack of closed form energy function. In this work, we introduce $q$-paths, a family of paths which is derived from a generalized notion of the mean, includes the geometric and arithmetic mixtures as special cases, and admits a simple closed form involving the deformed logarithm function from nonextensive thermodynamics. Following previous analysis of the geometric path, we interpret our $q$-paths as corresponding to a $q$-exponential family of distributions, and provide a variational representation of intermediate densities as minimizing a mixture of $\\alpha$-divergences to the endpoints. We show that small deviations away from the geometric path yield empirical gains for Bayesian inference using Sequential Monte Carlo and generative model evaluation using Annealed Importance Sampling.}\n}\n\n
\n
\n\n\n
\n Many common machine learning methods involve the geometric annealing path, a sequence of intermediate densities between two distributions of interest constructed using the geometric average. While alternatives such as the moment-averaging path have demonstrated performance gains in some settings, their practical applicability remains limited by exponential family endpoint assumptions and a lack of closed form energy function. In this work, we introduce $q$-paths, a family of paths which is derived from a generalized notion of the mean, includes the geometric and arithmetic mixtures as special cases, and admits a simple closed form involving the deformed logarithm function from nonextensive thermodynamics. Following previous analysis of the geometric path, we interpret our $q$-paths as corresponding to a $q$-exponential family of distributions, and provide a variational representation of intermediate densities as minimizing a mixture of $α$-divergences to the endpoints. We show that small deviations away from the geometric path yield empirical gains for Bayesian inference using Sequential Monte Carlo and generative model evaluation using Annealed Importance Sampling.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n A Closer Look at Gradient Estimators with Reinforcement Learning as Inference.\n \n \n \n \n\n\n \n Lavington, J. W.; Teng, M.; Schmidt, M.; and Wood, F.\n\n\n \n\n\n\n In Deep RL Workshop NeurIPS 2021, 2021. \n \n\n\n\n
\n\n\n\n \n \n \"A paper\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n\n \n  \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{\nlavington2021a,\ntitle={A Closer Look at Gradient Estimators with Reinforcement Learning as Inference},\nauthor={Jonathan Wilder Lavington and Michael Teng and Mark Schmidt and Frank Wood},\nbooktitle={Deep RL Workshop NeurIPS 2021},\nyear={2021},\nurl_Paper={https://openreview.net/forum?id=bR0K-nz1-6p}\n}\n\n
\n
\n\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Robust Asymmetric Learning in POMDPs.\n \n \n \n \n\n\n \n Warrington, A.; Lavington, J. W; Scibior, A.; Schmidt, M.; and Wood, F.\n\n\n \n\n\n\n In Meila, M.; and Zhang, T., editor(s), Proceedings of the 38th International Conference on Machine Learning, volume 139, of Proceedings of Machine Learning Research, pages 11013–11023, 18–24 Jul 2021. PMLR\n \n\n\n\n
\n\n\n\n \n \n \"Robust pdf\n  \n \n \n \"Robust paper\n  \n \n \n \"Robust arxiv\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 2 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@InProceedings{pmlr-v139-warrington21a,\n  title = \t {Robust Asymmetric Learning in POMDPs},\n  author =       {Warrington, Andrew and Lavington, Jonathan W and Scibior, Adam and Schmidt, Mark and Wood, Frank},\n  booktitle = \t {Proceedings of the 38th International Conference on Machine Learning},\n  pages = \t {11013--11023},\n  year = \t {2021},\n  editor = \t {Meila, Marina and Zhang, Tong},\n  volume = \t {139},\n  series = \t {Proceedings of Machine Learning Research},\n  month = \t {18--24 Jul},\n  publisher =    {PMLR},\n  url_pdf = \t {http://proceedings.mlr.press/v139/warrington21a/warrington21a.pdf},\n  url_Paper = \t {https://proceedings.mlr.press/v139/warrington21a.html},\n  url_ArXiv={https://arxiv.org/abs/2012.15566},\n  abstract = \t {Policies for partially observed Markov decision processes can be efficiently learned by imitating expert policies generated using asymmetric information. Unfortunately, existing approaches for this kind of imitation learning have a serious flaw: the expert does not know what the trainee cannot see, and as a result may encourage actions that are sub-optimal or unsafe under partial information. To address this issue, we derive an update which, when applied iteratively to an expert, maximizes the expected reward of the trainee’s policy. Using this update, we construct a computationally efficient algorithm, adaptive asymmetric DAgger (A2D), that jointly trains the expert and trainee policies. We then show that A2D allows the trainee to safely imitate the modified expert, and outperforms policies learned either by imitating a fixed expert or through direct reinforcement learning.}\n}\n\n
\n
\n\n\n
\n Policies for partially observed Markov decision processes can be efficiently learned by imitating expert policies generated using asymmetric information. Unfortunately, existing approaches for this kind of imitation learning have a serious flaw: the expert does not know what the trainee cannot see, and as a result may encourage actions that are sub-optimal or unsafe under partial information. To address this issue, we derive an update which, when applied iteratively to an expert, maximizes the expected reward of the trainee’s policy. Using this update, we construct a computationally efficient algorithm, adaptive asymmetric DAgger (A2D), that jointly trains the expert and trainee policies. We then show that A2D allows the trainee to safely imitate the modified expert, and outperforms policies learned either by imitating a fixed expert or through direct reinforcement learning.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Imagining The Road Ahead: Multi-Agent Trajectory Prediction via Differentiable Simulation.\n \n \n \n \n\n\n \n \n\n\n \n\n\n\n In IEEE Intelligent Transportation Systems Conference (ITSC), 2021. \n \n\n\n\n
\n\n\n\n \n \n \"Imagining arxiv\n  \n \n \n \"Imagining paper\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
\n
\n\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Imagining The Road Ahead: Multi-Agent Trajectory Prediction via Differentiable Simulation.\n \n \n \n \n\n\n \n Scibior, A.; Lioutas, V.; Reda, D.; Bateni, P.; and Wood, F.\n\n\n \n\n\n\n In CVPR Workshop on Autonomous Driving: Perception, Prediction and Planning, 2021. \n \n\n\n\n
\n\n\n\n \n \n \"Imagining arxiv\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{scibior2021imaginingcvprworkshop,\n  title={{I}magining {T}he {R}oad {A}head: Multi-Agent Trajectory Prediction via Differentiable Simulation},\n  author={Scibior, Adam and Lioutas, Vasileios and Reda, Daniele and Bateni, Peyman and Wood, Frank},\n  booktitle={CVPR Workshop on Autonomous Driving: Perception, Prediction and Planning},\n  year={2021}, \n  eprint={2104.11212},\n  archivePrefix={arXiv},\n  url_ArXiv = {https://arxiv.org/abs/2104.11212},\n  support = {MITACS},\n  abstract={We develop a deep generative model built on a fully differentiable simulator for multi-agent trajectory prediction. Agents are modeled with conditional recurrent variational neural networks (CVRNNs), which take as input an ego-centric birdview image representing the current state of the world and output an action, consisting of steering and acceleration, which is used to derive the subsequent agent state using a kinematic bicycle model. The full simulation state is then differentiably rendered for each agent, initiating the next time step. We achieve state-of-the-art results on the INTERACTION dataset, using standard neural architectures and a standard variational training objective, producing realistic multi-modal predictions without any ad-hoc diversity-inducing losses. We conduct ablation studies to examine individual components of the simulator, finding that both the kinematic bicycle model and the continuous feedback from the birdview image are crucial for achieving this level of performance. We name our model ITRA, for "Imagining the Road Ahead".}\n  }\n  \n
\n
\n\n\n
\n We develop a deep generative model built on a fully differentiable simulator for multi-agent trajectory prediction. Agents are modeled with conditional recurrent variational neural networks (CVRNNs), which take as input an ego-centric birdview image representing the current state of the world and output an action, consisting of steering and acceleration, which is used to derive the subsequent agent state using a kinematic bicycle model. The full simulation state is then differentiably rendered for each agent, initiating the next time step. We achieve state-of-the-art results on the INTERACTION dataset, using standard neural architectures and a standard variational training objective, producing realistic multi-modal predictions without any ad-hoc diversity-inducing losses. We conduct ablation studies to examine individual components of the simulator, finding that both the kinematic bicycle model and the continuous feedback from the birdview image are crucial for achieving this level of performance. We name our model ITRA, for \"Imagining the Road Ahead\".\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Sequential core-set Monte Carlo.\n \n \n \n \n\n\n \n Beronov, B.; Weilbach, C.; Wood, F.; and Campbell, T.\n\n\n \n\n\n\n In de Campos, C.; and Maathuis, M. H., editor(s), Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, volume 161, of Proceedings of Machine Learning Research, pages 2165–2175, 27–30 Jul 2021. PMLR\n \n\n\n\n
\n\n\n\n \n \n \"SequentialPaper\n  \n \n \n \"Sequential presentation\n  \n \n \n \"Sequential poster\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 6 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@InProceedings{BER-21,\n  title={Sequential core-set Monte Carlo},\n  author={Beronov, Boyan and Weilbach, Christian and Wood, Frank and Campbell, Trevor},\n  booktitle={Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence},\n  pages={2165--2175},\n  year={2021},\n  editor={de Campos, Cassio and Maathuis, Marloes H.},\n  volume={161},\n  series={Proceedings of Machine Learning Research},\n  month={27--30 Jul},\n  publisher={PMLR},\n  pdf={https://proceedings.mlr.press/v161/beronov21a/beronov21a.pdf},\n  url={https://proceedings.mlr.press/v161/beronov21a.html},\n  url_Presentation={https://github.com/plai-group/bibliography/raw/master/presentations_posters/UAI2021_BER_presentation.pdf},\n  url_Poster={https://github.com/plai-group/bibliography/raw/master/presentations_posters/UAI2021_BER_poster.pdf},\n  support={D3M},\n  abstract={Sequential Monte Carlo (SMC) is a general-purpose methodology for recursive Bayesian inference, and is widely used in state space modeling and probabilistic programming. Its resample-move variant reduces the variance of posterior estimates by interleaving Markov chain Monte Carlo (MCMC) steps for particle “rejuvenation”; but this requires accessing all past observations and leads to linearly growing memory size and quadratic computation cost. Under the assumption of exchangeability, we introduce sequential core-set Monte Carlo (SCMC), which achieves constant space and linear time by rejuvenating based on sparse, weighted subsets of past data. In contrast to earlier approaches, which uniformly subsample or throw away observations, SCMC uses a novel online version of a state-of-the-art Bayesian core-set algorithm to incrementally construct a nonparametric, data- and model-dependent variational representation of the unnormalized target density. Experiments demonstrate significantly reduced approximation errors at negligible additional cost.}\n}\n\n
\n
\n\n\n
\n Sequential Monte Carlo (SMC) is a general-purpose methodology for recursive Bayesian inference, and is widely used in state space modeling and probabilistic programming. Its resample-move variant reduces the variance of posterior estimates by interleaving Markov chain Monte Carlo (MCMC) steps for particle “rejuvenation”; but this requires accessing all past observations and leads to linearly growing memory size and quadratic computation cost. Under the assumption of exchangeability, we introduce sequential core-set Monte Carlo (SCMC), which achieves constant space and linear time by rejuvenating based on sparse, weighted subsets of past data. In contrast to earlier approaches, which uniformly subsample or throw away observations, SCMC uses a novel online version of a state-of-the-art Bayesian core-set algorithm to incrementally construct a nonparametric, data- and model-dependent variational representation of the unnormalized target density. Experiments demonstrate significantly reduced approximation errors at negligible additional cost.\n
\n\n\n
\n\n\n\n\n\n
\n
\n\n
\n
\n  \n techreport\n \n \n (1)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n \n Probabilistic Label-Efficient Deep Generative Structures (PLEDGES).\n \n \n \n \n\n\n \n Pfeffer, A.; Call, C.; Wood, F.; Rosenberg, B.; Bibbiani, K.; Sigal, L.; Shah, I.; Erdogmus, D.; Singh, S.; and van de Meent, J. W\n\n\n \n\n\n\n Technical Report Charles River Analytics Inc., 2021.\n \n\n\n\n
\n\n\n\n \n \n \"ProbabilisticPaper\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@techreport{pfeffer2021probabilistic,\n  title={Probabilistic Label-Efficient Deep Generative Structures (PLEDGES)},\n  author={Pfeffer, Avi and Call, Catherine and Wood, Frank and Rosenberg, Brad and Bibbiani, Kirstin and Sigal, Leonid and Shah, Ishaan and Erdogmus, Deniz and Singh, Sameer and van de Meent, Jan W},\n  year={2021},\n  institution={Charles River Analytics Inc.},\n  url={https://apps.dtic.mil/sti/citations/AD1145098}\n}\n\n
\n
\n\n\n\n
\n\n\n\n\n\n
\n
\n\n
\n
\n  \n unpublished\n \n \n (2)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n \n Image Completion via Inference in Deep Generative Models.\n \n \n \n \n\n\n \n Harvey, W.; Naderiparizi, S.; and Wood, F.\n\n\n \n\n\n\n 2021.\n \n\n\n\n
\n\n\n\n \n \n \"Image arxiv\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n\n \n  \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@unpublished{harvey2021image,\n  title={Image Completion via Inference in Deep Generative Models},\n  author={Harvey, William and Naderiparizi, Saeid and Wood, Frank},\n  journal={arXiv preprint arXiv:2102.12037},\n  year={2021},\n  url_ArXiv = {https://arxiv.org/abs/2102.12037},\n  eprint={2102.12037},\n  archivePrefix={arXiv},\n   support = {D3M},\n  abtstract={We consider image completion from the perspective of amortized inference in an image generative model. We leverage recent state of the art variational auto-encoder architectures that have been shown to produce photo-realistic natural images at non-trivial resolutions. Through amortized inference in such a model we can train neural artifacts that produce diverse, realistic image completions even when the vast majority of an image is missing. We demonstrate superior sample quality and diversity compared to prior art on the CIFAR-10 and FFHQ-256 datasets. We conclude by describing and demonstrating an application that requires an in-painting model with the capabilities ours exhibits: the use of Bayesian optimal experimental design to select the most informative sequence of small field of view x-rays for chest pathology detection.}\n}\n\n
\n
\n\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Differentiable Particle Filtering without Modifying the Forward Pass.\n \n \n \n \n\n\n \n Scibior, A.; and Wood, F.\n\n\n \n\n\n\n 2021.\n \n\n\n\n
\n\n\n\n \n \n \"Differentiable arxiv\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@unpublished{scibior2021differentiable,\n      title={Differentiable Particle Filtering without Modifying the Forward Pass}, \n      author={Adam Scibior and Frank Wood},\n      year={2021},\n      eprint={2106.10314},\n      archivePrefix={arXiv},\n      primaryClass={stat.ML},\n      url_ArXiv = {https://arxiv.org/abs/2106.10314},\n      abstract={Particle filters are not compatible with automatic differentiation due to the presence of discrete resampling steps. While known estimators for the score function, based on Fisher's identity, can be computed using particle filters, up to this point they required manual implementation. In this paper we show that such estimators can be computed using automatic differentiation, after introducing a simple correction to the particle weights. This correction utilizes the stop-gradient operator and does not modify the particle filter operation on the forward pass, while also being cheap and easy to compute. Surprisingly, with the same correction automatic differentiation also produces good estimators for gradients of expectations under the posterior. We can therefore regard our method as a general recipe for making particle filters differentiable. We additionally show that it produces desired estimators for second-order derivatives and how to extend it to further reduce variance at the expense of additional computation.}\n}\n\n
\n
\n\n\n
\n Particle filters are not compatible with automatic differentiation due to the presence of discrete resampling steps. While known estimators for the score function, based on Fisher's identity, can be computed using particle filters, up to this point they required manual implementation. In this paper we show that such estimators can be computed using automatic differentiation, after introducing a simple correction to the particle weights. This correction utilizes the stop-gradient operator and does not modify the particle filter operation on the forward pass, while also being cheap and easy to compute. Surprisingly, with the same correction automatic differentiation also produces good estimators for gradients of expectations under the posterior. We can therefore regard our method as a general recipe for making particle filters differentiable. We additionally show that it produces desired estimators for second-order derivatives and how to extend it to further reduce variance at the expense of additional computation.\n
\n\n\n
\n\n\n\n\n\n
\n
\n\n\n\n\n
\n
\n\n
\n
\n  \n 2020\n \n \n (3)\n \n \n
\n
\n \n \n
\n
\n  \n article\n \n \n (1)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n \n Target-Aware Bayesian Inference: How to Beat Optimal Conventional Estimators.\n \n \n \n \n\n\n \n Rainforth, T.; Golinski, A.; Wood, F.; and Zaidi, S.\n\n\n \n\n\n\n Journal of Machine Learning Research, 21(88): 1-54. 2020.\n \n\n\n\n
\n\n\n\n \n \n \"Target-Aware link\n  \n \n \n \"Target-Aware paper\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n\n \n  \n \n 6 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@article{RAI-20,\n  author  = {Tom Rainforth and Adam Golinski and Frank Wood and Sheheryar Zaidi},\n  title   = {Target-Aware Bayesian Inference: How to Beat Optimal Conventional Estimators},\n  journal = {Journal of Machine Learning Research},\n  year    = {2020},\n  volume  = {21},\n  number  = {88},\n  pages   = {1-54},\n  url_Link = {http://jmlr.org/papers/v21/19-102.html},\n  url_Paper = {https://www.jmlr.org/papers/volume21/19-102/19-102.pdf}\n}\n\n
\n
\n\n\n\n
\n\n\n\n\n\n
\n
\n\n
\n
\n  \n inproceedings\n \n \n (10)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n \n Annealed Importance Sampling with q-Paths.\n \n \n \n \n\n\n \n Brekelmans*, R.; Masrani*, V.; Thang, B.; Wood, F.; Galstyan, A.; Ver Steeg, G.; and Nielsen, F.\n\n\n \n\n\n\n In NeurIPS Workshop on Deep Learning through Information Geometry (Best Paper Award), 2020. \n \n\n\n\n
\n\n\n\n \n \n \"Annealed arxiv\n  \n \n \n \"Annealed paper\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{brekelmans2020qpaths,\n  title  = {Annealed Importance Sampling with q-Paths},\n  author = {Brekelmans*, R. and Masrani*, V. and Thang, B. and Wood, F. and Galstyan, A. and Ver Steeg, G. and Nielsen, F.},\n  year   = {2020},\n  abbr={NeurIPS},\n  booktitle = {NeurIPS Workshop on Deep Learning through Information Geometry (Best Paper Award)},\n  url_ArXiv={https://arxiv.org/abs/2012.07823},\n  url_Paper={https://arxiv.org/pdf/2012.07823.pdf},\n  abstract={Annealed Importance Sampling (AIS) is the gold standard for estimating partition functions or marginal likelihoods, corresponding to importance sampling over a path of distributions between a tractable base and an unnormalized target. While AIS yields an unbiased estimator for any path, existing literature has been limited to the geometric mixture or moment-averaged paths associated with the exponential family and KL divergence. We explore AIS using q-paths, which include the geometric path as a special case and are related to the homogeneous power mean, deformed exponential family, and α-divergence.}\n}\n\n
\n
\n\n\n
\n Annealed Importance Sampling (AIS) is the gold standard for estimating partition functions or marginal likelihoods, corresponding to importance sampling over a path of distributions between a tractable base and an unnormalized target. While AIS yields an unbiased estimator for any path, existing literature has been limited to the geometric mixture or moment-averaged paths associated with the exponential family and KL divergence. We explore AIS using q-paths, which include the geometric path as a special case and are related to the homogeneous power mean, deformed exponential family, and α-divergence.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective.\n \n \n \n \n\n\n \n Nguyen, V.; Masrani, V.; Brekelmans, R.; Osborne, M.; and Wood, F.\n\n\n \n\n\n\n In of Advances in Neural Information Processing Systems (NeurIPS), 2020. \n \n\n\n\n
\n\n\n\n \n \n \"Gaussian link\n  \n \n \n \"Gaussian paper\n  \n \n \n \"Gaussian arxiv\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 6 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@InProceedings{nguyen2020gaussian,\n  title={Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective},\n  author={Nguyen, Vu and Masrani, Vaden and Brekelmans, Rob and Osborne, Michael and Wood, Frank},\n  series={Advances in Neural Information Processing Systems (NeurIPS)},\n  year={2020},\n  url_Link = {https://proceedings.neurips.cc/paper/2020/hash/3f2dff7862a70f97a59a1fa02c3ec110-Abstract.html}, \n  url_Paper = {https://proceedings.neurips.cc/paper/2020/file/3f2dff7862a70f97a59a1fa02c3ec110-Paper.pdf}, \n  url_ArXiv={https://arxiv.org/abs/2010.15750},\n  support = {D3M},\n  abstract={Achieving the full promise of the Thermodynamic Variational Objective (TVO), a recently proposed variational lower bound on the log evidence involving a one-dimensional Riemann integral approximation, requires choosing a "schedule" of sorted discretization points. This paper introduces a bespoke Gaussian process bandit optimization method for automatically choosing these points. Our approach not only automates their one-time selection, but also dynamically adapts their positions over the course of optimization, leading to improved model learning and inference. We provide theoretical guarantees that our bandit optimization converges to the regret-minimizing choice of integration points. Empirical validation of our algorithm is provided in terms of improved learning and inference in Variational Autoencoders and Sigmoid Belief Networks.}\n}\n\n
\n
\n\n\n
\n Achieving the full promise of the Thermodynamic Variational Objective (TVO), a recently proposed variational lower bound on the log evidence involving a one-dimensional Riemann integral approximation, requires choosing a \"schedule\" of sorted discretization points. This paper introduces a bespoke Gaussian process bandit optimization method for automatically choosing these points. Our approach not only automates their one-time selection, but also dynamically adapts their positions over the course of optimization, leading to improved model learning and inference. We provide theoretical guarantees that our bandit optimization converges to the regret-minimizing choice of integration points. Empirical validation of our algorithm is provided in terms of improved learning and inference in Variational Autoencoders and Sigmoid Belief Networks.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Revisiting Reweighted Wake-Sleep for Models with Stochastic Control Flow.\n \n \n \n \n\n\n \n Le, T. A.; Kosiorek, A. R.; Siddharth, N.; Teh, Y. W.; and Wood, F.\n\n\n \n\n\n\n In Adams, R. P.; and Gogate, V., editor(s), volume 115, of Proceedings of the 35th conference on Uncertainty in Artificial Intelligence (UAI), pages 1039–1049, Tel Aviv, Israel, 22–25 Jul 2020. PMLR\n \n\n\n\n
\n\n\n\n \n \n \"Revisiting link\n  \n \n \n \"Revisiting paper\n  \n \n \n \"Revisiting arxiv\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 3 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@InProceedings{Le-20, \n  title = {Revisiting Reweighted Wake-Sleep for Models with Stochastic Control Flow}, \n  author = {Le, Tuan Anh and Kosiorek, Adam R. and Siddharth, N. and Teh, Yee Whye and Wood, Frank}, \n  pages = {1039--1049}, \n  year = {2020}, \n  editor = {Ryan P. Adams and Vibhav Gogate}, \n  volume = {115}, \n  series = {Proceedings of the 35th conference on Uncertainty in Artificial Intelligence (UAI)}, \n  address = {Tel Aviv, Israel}, \n  month = {22--25 Jul}, \n  publisher = {PMLR}, \n  url_Link = {http://proceedings.mlr.press/v115/le20a.html}, \n  url_Paper = {http://proceedings.mlr.press/v115/le20a/le20a.pdf}, \n  url_ArXiv={https://arxiv.org/abs/1805.10469},\n  support = {D3M},\n  abstract = {Stochastic control-flow models (SCFMs) are a class of generative models that involve branching on choices from discrete random variables. Amortized gradient-based learning of SCFMs is challenging as most approaches targeting discrete variables rely on their continuous relaxations—which can be intractable in SCFMs, as branching on relaxations requires evaluating all (exponentially many) branching paths. Tractable alternatives mainly combine REINFORCE with complex control-variate schemes to improve the variance of naive estimators. Here, we revisit the reweighted wake-sleep (RWS) [5] algorithm, and through extensive evaluations, show that it outperforms current state-of-the-art methods in learning SCFMs. Further, in contrast to the importance weighted autoencoder, we observe that RWS learns better models and inference networks with increasing numbers of particles. Our results suggest that RWS is a competitive, often preferable, alternative for learning SCFMs.} \n  }\n\n
\n
\n\n\n
\n Stochastic control-flow models (SCFMs) are a class of generative models that involve branching on choices from discrete random variables. Amortized gradient-based learning of SCFMs is challenging as most approaches targeting discrete variables rely on their continuous relaxations—which can be intractable in SCFMs, as branching on relaxations requires evaluating all (exponentially many) branching paths. Tractable alternatives mainly combine REINFORCE with complex control-variate schemes to improve the variance of naive estimators. Here, we revisit the reweighted wake-sleep (RWS) [5] algorithm, and through extensive evaluations, show that it outperforms current state-of-the-art methods in learning SCFMs. Further, in contrast to the importance weighted autoencoder, we observe that RWS learns better models and inference networks with increasing numbers of particles. Our results suggest that RWS is a competitive, often preferable, alternative for learning SCFMs.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Semi-supervised Sequential Generative Models.\n \n \n \n \n\n\n \n Teng, M.; Le, T. A.; Scibior, A.; and Wood, F.\n\n\n \n\n\n\n In Conference on Uncertainty in Artificial Intelligence (UAI), 2020. \n \n\n\n\n
\n\n\n\n \n \n \"Semi-supervised link\n  \n \n \n \"Semi-supervised paper\n  \n \n \n \"Semi-supervised arxiv\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n\n \n  \n \n 5 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{TEN-20,\n  title={Semi-supervised Sequential Generative Models},\n  author={Teng, Michael and Le, Tuan Anh and Scibior, Adam and Wood, Frank},\n  booktitle={Conference on Uncertainty in Artificial Intelligence (UAI)},\n  eid = {arXiv:2007.00155},\n  archivePrefix = {arXiv},\n  eprint = {2007.00155},\n  url_Link = {http://www.auai.org/~w-auai/uai2020/accepted.php},\n  url_Paper={http://www.auai.org/uai2020/proceedings/272_main_paper.pdf},\n  url_ArXiv = {https://arxiv.org/abs/2007.00155},\n  support = {D3M},\n  year={2020}\n}\n\n
\n
\n\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Structured Conditional Continuous Normalizing Flows for Efficient Amortized Inference in Graphical Models.\n \n \n \n \n\n\n \n Weilbach, C.; Beronov, B.; Wood, F.; and Harvey, W.\n\n\n \n\n\n\n In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (AISTATS), pages 4441–4451, 2020. \n \n\nPMLR 108:4441-4451\n\n
\n\n\n\n \n \n \"Structured link\n  \n \n \n \"Structured paper\n  \n \n \n \"Structured poster\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 6 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{WEI-20,\n  title={Structured Conditional Continuous Normalizing Flows for Efficient Amortized Inference in Graphical Models},\n  author={Weilbach, Christian and Beronov, Boyan and Wood, Frank and Harvey, William},\n  booktitle={Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (AISTATS)},\n  pages={4441--4451},\n  year={2020},\n  url_Link={http://proceedings.mlr.press/v108/weilbach20a.html},\n  url_Paper={http://proceedings.mlr.press/v108/weilbach20a/weilbach20a.pdf},\n  url_Poster={https://github.com/plai-group/bibliography/blob/master/presentations_posters/PROBPROG2020_WEI.pdf},\n  support = {D3M},\n  bibbase_note = {PMLR 108:4441-4451},\n  abstract = {We exploit minimally faithful inversion of graphical model structures to specify sparse continuous normalizing flows (CNFs) for amortized inference. We find that the sparsity of this factorization can be exploited to reduce the numbers of parameters in the neural network, adaptive integration steps of the flow, and consequently FLOPs at both training and inference time without decreasing performance in comparison to unconstrained flows. By expressing the structure inversion as a compilation pass in a probabilistic programming language, we are able to apply it in a novel way to models as complex as convolutional neural networks. Furthermore, we extend the training objective for CNFs in the context of inference amortization to the symmetric Kullback-Leibler divergence, and demonstrate its theoretical and practical advantages.}\n}\n\n
\n
\n\n\n
\n We exploit minimally faithful inversion of graphical model structures to specify sparse continuous normalizing flows (CNFs) for amortized inference. We find that the sparsity of this factorization can be exploited to reduce the numbers of parameters in the neural network, adaptive integration steps of the flow, and consequently FLOPs at both training and inference time without decreasing performance in comparison to unconstrained flows. By expressing the structure inversion as a compilation pass in a probabilistic programming language, we are able to apply it in a novel way to models as complex as convolutional neural networks. Furthermore, we extend the training objective for CNFs in the context of inference amortization to the symmetric Kullback-Leibler divergence, and demonstrate its theoretical and practical advantages.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n All in the Exponential Family: Bregman Duality in Thermodynamic Variational Inference.\n \n \n \n \n\n\n \n Brekelmans, R.; Masrani, V.; Wood, F.; Ver Steeg, G.; and Galstyan, A.\n\n\n \n\n\n\n In Thirty-seventh International Conference on Machine Learning (ICML 2020), July 2020. \n \n\n\n\n
\n\n\n\n \n \n \"All link\n  \n \n \n \"All paper\n  \n \n \n \"All arxiv\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 5 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n\n\n\n
\n
@inproceedings{BRE-20,\n  author = {{Brekelmans}, Rob and {Masrani}, Vaden and {Wood}, Frank and {Ver Steeg}, Greg and {Galstyan}, Aram},\n  title = {All in the Exponential Family: Bregman Duality in Thermodynamic Variational Inference},\n  booktitle={Thirty-seventh International Conference on Machine Learning (ICML 2020)},\n  keywords = {Computer Science - Machine Learning, Statistics - Machine Learning},\n  year = 2020,\n  month = jul,\n  eid = {arXiv:2007.00642},\n  archivePrefix = {arXiv},\n  eprint = {2007.00642},\n  url_Link = {https://proceedings.icml.cc/book/2020/hash/12311d05c9aa67765703984239511212},\n  url_Paper={https://proceedings.icml.cc/static/paper_files/icml/2020/2826-Paper.pdf},\n  url_ArXiv={https://arxiv.org/abs/2007.00642},\n  support = {D3M},\n  abstract={The recently proposed Thermodynamic Variational Objective (TVO) leverages thermodynamic integration to provide a family of variational inference objectives, which both tighten and generalize the ubiquitous Evidence Lower Bound (ELBO). However, the tightness of TVO bounds was not previously known, an expensive grid search was used to choose a "schedule" of intermediate distributions, and model learning suffered with ostensibly tighter bounds. In this work, we propose an exponential family interpretation of the geometric mixture curve underlying the TVO and various path sampling methods, which allows us to characterize the gap in TVO likelihood bounds as a sum of KL divergences. We propose to choose intermediate distributions using equal spacing in the moment parameters of our exponential family, which matches grid search performance and allows the schedule to adaptively update over the course of training. Finally, we derive a doubly reparameterized gradient estimator which improves model learning and allows the TVO to benefit from more refined bounds. To further contextualize our contributions, we provide a unified framework for understanding thermodynamic integration and the TVO using Taylor series remainders.}\n  }\n\n%@unpublished{WOO-20,\n%  author = {{Wood}, Frank and {Warrington}, Andrew and {Naderiparizi}, Saeid and {Weilbach}, Christian and {Masrani}, Vaden and {Harvey}, William and {Scibior}, Adam and {Beronov}, Boyan and {Nasseri}, Ali},\n%  title = {Planning as Inference in Epidemiological Models},\n%  journal = {arXiv e-prints},\n%  keywords = {Quantitative Biology - Populations and Evolution, Computer Science - Machine Learning, Statistics - Machine Learning},\n%  year = {2020},\n%  eid = {arXiv:2003.13221},\n%  archivePrefix = {arXiv},\n%  eprint = {2003.13221},\n%  support = {D3M,COVID,ETALUMIS},\n%  url_ArXiv={https://arxiv.org/abs/2003.13221},\n%  url_Paper={https://arxiv.org/pdf/2003.13221.pdf},\n%  abstract={In this work we demonstrate how existing software tools can be used to automate parts of infectious disease-control policy-making via performing inference in existing epidemiological dynamics models. The kind of inference tasks undertaken include computing, for planning purposes, the posterior distribution over putatively controllable, via direct policy-making choices, simulation model parameters that give rise to acceptable disease progression outcomes. Neither the full capabilities of such inference automation software tools nor their utility for planning is widely disseminated at the current time. Timely gains in understanding about these tools and how they can be used may lead to more fine-grained and less economically damaging policy prescriptions, particularly during the current COVID-19 pandemic.}\n%}\n\n
\n
\n\n\n
\n The recently proposed Thermodynamic Variational Objective (TVO) leverages thermodynamic integration to provide a family of variational inference objectives, which both tighten and generalize the ubiquitous Evidence Lower Bound (ELBO). However, the tightness of TVO bounds was not previously known, an expensive grid search was used to choose a \"schedule\" of intermediate distributions, and model learning suffered with ostensibly tighter bounds. In this work, we propose an exponential family interpretation of the geometric mixture curve underlying the TVO and various path sampling methods, which allows us to characterize the gap in TVO likelihood bounds as a sum of KL divergences. We propose to choose intermediate distributions using equal spacing in the moment parameters of our exponential family, which matches grid search performance and allows the schedule to adaptively update over the course of training. Finally, we derive a doubly reparameterized gradient estimator which improves model learning and allows the TVO to benefit from more refined bounds. To further contextualize our contributions, we provide a unified framework for understanding thermodynamic integration and the TVO using Taylor series remainders.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Coping With Simulators That Don’t Always Return.\n \n \n \n \n\n\n \n Warrington, A; Naderiparizi, S; and Wood, F\n\n\n \n\n\n\n In The 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), 2020. \n \n\nPMLR 108:1748-1758\n\n
\n\n\n\n \n \n \"Coping link\n  \n \n \n \"Coping paper\n  \n \n \n \"Coping poster\n  \n \n \n \"Coping arxiv\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 7 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n \n \n\n\n\n
\n
@inproceedings{WAR-20,\n  title={Coping With Simulators That Don’t Always Return},\n  author={Warrington, A and Naderiparizi, S and Wood, F},\n  booktitle={The 23rd International Conference on Artificial Intelligence and Statistics (AISTATS)},\n  archiveprefix = {arXiv},\n  eprint = {1906.05462},\n  year={2020},\n  url_Link = {http://proceedings.mlr.press/v108/warrington20a.html},\n  url_Paper = {http://proceedings.mlr.press/v108/warrington20a/warrington20a.pdf},\n  url_Poster = {https://github.com/plai-group/bibliography/blob/master/presentations_posters/WAR-20.pdf},\n  url_ArXiv = {https://arxiv.org/abs/2003.12908},\n  keywords = {simulators, smc, autoregressive flow},\n  support = {D3M,ETALUMIS},\n  bibbase_note={PMLR 108:1748-1758},\n  abstract = {Deterministic models are approximations of reality that are easy to interpret and often easier to build than stochastic alternatives. Unfortunately, as nature is capricious, observational data can never be fully explained by deterministic models in practice. Observation and process noise need to be added to adapt deterministic models to behave stochastically, such that they are capable of explaining and extrapolating from noisy data. We investigate and address computational inefficiencies that arise from adding process noise to deterministic simulators that fail to return for certain inputs; a property we describe as "brittle." We show how to train a conditional normalizing flow to propose perturbations such that the simulator succeeds with high probability, increasing computational efficiency.}\n  }\n\n
\n
\n\n\n
\n Deterministic models are approximations of reality that are easy to interpret and often easier to build than stochastic alternatives. Unfortunately, as nature is capricious, observational data can never be fully explained by deterministic models in practice. Observation and process noise need to be added to adapt deterministic models to behave stochastically, such that they are capable of explaining and extrapolating from noisy data. We investigate and address computational inefficiencies that arise from adding process noise to deterministic simulators that fail to return for certain inputs; a property we describe as \"brittle.\" We show how to train a conditional normalizing flow to propose perturbations such that the simulator succeeds with high probability, increasing computational efficiency.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Attention for Inference Compilation.\n \n \n \n \n\n\n \n Harvey, W; Munk, A; Baydin, A.; Bergholm, A; and Wood, F\n\n\n \n\n\n\n In The second International Conference on Probabilistic Programming (PROBPROG), 2020. \n \n\n\n\n
\n\n\n\n \n \n \"Attention paper\n  \n \n \n \"Attention arxiv\n  \n \n \n \"Attention poster\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 10 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{HAR-20,\n  title={Attention for Inference Compilation},\n  author={Harvey, W and Munk, A and Baydin, AG and Bergholm, A and Wood, F},\n  booktitle={The second International Conference on Probabilistic Programming (PROBPROG)},\n  year={2020},\n  archiveprefix = {arXiv},\n  eprint = {1910.11961},\n  support = {D3M,LwLL},\n  url_Paper={https://arxiv.org/pdf/1910.11961.pdf},\n  url_ArXiv={https://arxiv.org/abs/1910.11961},\n  url_Poster={https://github.com/plai-group/bibliography/blob/master/presentations_posters/PROBPROG2020_HAR.pdf},\n  abstract = {We present a new approach to automatic amortized inference in universal probabilistic programs which improves performance compared to current methods. Our approach is a variation of inference compilation (IC) which leverages deep neural networks to approximate a posterior distribution over latent variables in a probabilistic program. A challenge with existing IC network architectures is that they can fail to model long-range dependencies between latent variables. To address this, we introduce an attention mechanism that attends to the most salient variables previously sampled in the execution of a probabilistic program. We demonstrate that the addition of attention allows the proposal distributions to better match the true posterior, enhancing inference about latent variables in simulators.},\n}\n\n
\n
\n\n\n
\n We present a new approach to automatic amortized inference in universal probabilistic programs which improves performance compared to current methods. Our approach is a variation of inference compilation (IC) which leverages deep neural networks to approximate a posterior distribution over latent variables in a probabilistic program. A challenge with existing IC network architectures is that they can fail to model long-range dependencies between latent variables. To address this, we introduce an attention mechanism that attends to the most salient variables previously sampled in the execution of a probabilistic program. We demonstrate that the addition of attention allows the proposal distributions to better match the true posterior, enhancing inference about latent variables in simulators.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Deep probabilistic surrogate networks for universal simulator approximation.\n \n \n \n \n\n\n \n Munk, A.; Ścibior, A.; Baydin, A.; Stewart, A; Fernlund, A; Poursartip, A; and Wood, F.\n\n\n \n\n\n\n In The second International Conference on Probabilistic Programming (PROBPROG), 2020. \n \n\n\n\n
\n\n\n\n \n \n \"Deep paper\n  \n \n \n \"Deep arxiv\n  \n \n \n \"Deep poster\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 5 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{MUN-20,\n  title={Deep probabilistic surrogate networks for universal simulator approximation},\n  author={Munk, Andreas and Ścibior, Adam and Baydin, AG and Stewart, A and Fernlund, A and Poursartip, A and Wood, Frank},\n  booktitle={The second International Conference on Probabilistic Programming (PROBPROG)},\n  year={2020},\n  archiveprefix = {arXiv},\n  eprint = {1910.11950},\n  support = {D3M,ETALUMIS},\n  url_Paper={https://arxiv.org/pdf/1910.11950.pdf},\n  url_ArXiv={https://arxiv.org/abs/1910.11950},\n  url_Poster={https://github.com/plai-group/bibliography/blob/master/presentations_posters/PROBPROG2020_MUN.pdf},\n  abstract = {We present a framework for automatically structuring and training fast, approximate, deep neural surrogates of existing stochastic simulators. Unlike traditional approaches to surrogate modeling, our surrogates retain the interpretable structure of the reference simulators. The particular way we achieve this allows us to replace the reference simulator with the surrogate when undertaking amortized inference in the probabilistic programming sense. The fidelity and speed of our surrogates allow for not only faster "forward" stochastic simulation but also for accurate and substantially faster inference. We support these claims via experiments that involve a commercial composite-materials curing simulator. Employing our surrogate modeling technique makes inference an order of magnitude faster, opening up the possibility of doing simulator-based, non-invasive, just-in-time parts quality testing; in this case inferring safety-critical latent internal temperature profiles of composite materials undergoing curing from surface temperature profile measurements.},\n}\n\n
\n
\n\n\n
\n We present a framework for automatically structuring and training fast, approximate, deep neural surrogates of existing stochastic simulators. Unlike traditional approaches to surrogate modeling, our surrogates retain the interpretable structure of the reference simulators. The particular way we achieve this allows us to replace the reference simulator with the surrogate when undertaking amortized inference in the probabilistic programming sense. The fidelity and speed of our surrogates allow for not only faster \"forward\" stochastic simulation but also for accurate and substantially faster inference. We support these claims via experiments that involve a commercial composite-materials curing simulator. Employing our surrogate modeling technique makes inference an order of magnitude faster, opening up the possibility of doing simulator-based, non-invasive, just-in-time parts quality testing; in this case inferring safety-critical latent internal temperature profiles of composite materials undergoing curing from surface temperature profile measurements.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Improved Few-Shot Visual Classification.\n \n \n \n \n\n\n \n Bateni, P.; Goyal, R.; Masrani, V.; Wood, F.; and Sigal, L.\n\n\n \n\n\n\n In Conference on Computer Vision and Pattern Recognition (CVPR), 2020. \n \n\n\n\n
\n\n\n\n \n \n \"Improved link\n  \n \n \n \"Improved paper\n  \n \n \n \"Improved arxiv\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 10 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n\n\n\n
\n
@inproceedings{BAT-20,\n  author = {{Bateni}, Peyman and {Goyal}, Raghav and {Masrani}, Vaden and {Wood}, Frank and {Sigal}, Leonid},\n  title = {Improved Few-Shot Visual Classification},\n  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},\n  keywords = {LwLL, Computer Science - Computer Vision and Pattern Recognition},\n  year = {2020},\n  eid = {arXiv:1912.03432},\n  archivePrefix = {arXiv},\n  eprint = {1912.03432},\n  support = {D3M,LwLL},\n  url_Link = {https://openaccess.thecvf.com/content_CVPR_2020/html/Bateni_Improved_Few-Shot_Visual_Classification_CVPR_2020_paper.html},\n  url_Paper={http://openaccess.thecvf.com/content_CVPR_2020/papers/Bateni_Improved_Few-Shot_Visual_Classification_CVPR_2020_paper.pdf},\n  url_ArXiv={https://arxiv.org/abs/1912.03432},\n  abstract={Few-shot learning is a fundamental task in computer vision that carries the promise of alleviating the need for exhaustively labeled data. Most few-shot learning approaches to date have focused on progressively more complex neural feature extractors and classifier adaptation strategies, as well as the refinement of the task definition itself. In this paper, we explore the hypothesis that a simple class-covariance-based distance metric, namely the Mahalanobis distance, adopted into a state of the art few-shot learning approach (CNAPS) can, in and of itself, lead to a significant performance improvement. We also discover that it is possible to learn adaptive feature extractors that allow useful estimation of the high dimensional feature covariances required by this metric from surprisingly few samples. The result of our work is a new "Simple CNAPS" architecture which has up to 9.2% fewer trainable parameters than CNAPS and performs up to 6.1% better than state of the art on the standard few-shot image classification benchmark dataset.}\n}\n\n%@inproceedings{WAN-19,\n%  title={Safer End-to-End Autonomous Driving via Conditional Imitation Learning and Command Augmentation},\n%  author={Wang, R and Scibior, A and Wood F},\n%  booktitle={NeurIPS self-driving car workshop},\n%  year={2019},\n%  archiveprefix = {arXiv},\n%  eprint = {1909.09721},\n%  support = {D3M},\n%  url_Paper = {https://arxiv.org/pdf/1909.09721.pdf},\n%  url_ArXiv={https://arxiv.org/abs/1909.09721},\n%  abstract={Imitation learning is a promising approach to end-to-end training of autonomous vehicle controllers. Typically the driving process with such approaches is entirely automatic and black-box, although in practice it is desirable to control the vehicle through high-level commands, such as telling it which way to go at an intersection. In existing work this has been accomplished by the application of a branched neural architecture, since directly providing the command as an additional input to the controller often results in the command being ignored. In this work we overcome this limitation by learning a disentangled probabilistic latent variable model that generates the steering commands. We achieve faithful command-conditional generation without using a branched architecture and demonstrate improved stability of the controller, applying only a variational objective without any domain-specific adjustments. On top of that, we extend our model with an additional latent variable and augment the dataset to train a controller that is robust to unsafe commands, such as asking it to turn into a wall. The main contribution of this work is a recipe for building controllable imitation driving agents that improves upon multiple aspects of the current state of the art relating to robustness and interpretability.}\n%}\n\n
\n
\n\n\n
\n Few-shot learning is a fundamental task in computer vision that carries the promise of alleviating the need for exhaustively labeled data. Most few-shot learning approaches to date have focused on progressively more complex neural feature extractors and classifier adaptation strategies, as well as the refinement of the task definition itself. In this paper, we explore the hypothesis that a simple class-covariance-based distance metric, namely the Mahalanobis distance, adopted into a state of the art few-shot learning approach (CNAPS) can, in and of itself, lead to a significant performance improvement. We also discover that it is possible to learn adaptive feature extractors that allow useful estimation of the high dimensional feature covariances required by this metric from surprisingly few samples. The result of our work is a new \"Simple CNAPS\" architecture which has up to 9.2% fewer trainable parameters than CNAPS and performs up to 6.1% better than state of the art on the standard few-shot image classification benchmark dataset.\n
\n\n\n
\n\n\n\n\n\n
\n
\n\n
\n
\n  \n unpublished\n \n \n (2)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n \n Ensemble Squared: A Meta AutoML System.\n \n \n \n \n\n\n \n Yoo, J.; Joseph, T.; Yung, D.; Nasseri, S. A.; and Wood, F.\n\n\n \n\n\n\n 2020.\n \n\n\n\n
\n\n\n\n \n \n \"Ensemble arxiv\n  \n \n \n \"Ensemble paper\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 8 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@unpublished{yoo2020ensemble,\n      title={Ensemble Squared: A Meta AutoML System}, \n      author={Jason Yoo and Tony Joseph and Dylan Yung and S. Ali Nasseri and Frank Wood},\n      year={2020},\n      eprint={2012.05390},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG},  \n      url_ArXiv={https://arxiv.org/abs/2012.05390},\n      url_Paper={https://arxiv.org/pdf/2012.05390.pdf},\n      support = {D3M},\n      abstract = {The continuing rise in the number of problems amenable to machine learning solutions, coupled with simultaneous growth in both computing power and variety of machine learning techniques has led to an explosion of interest in automated machine learning (AutoML). This paper presents Ensemble Squared (Ensemble2), a "meta" AutoML system that ensembles at the level of AutoML systems. Ensemble2 exploits the diversity of existing, competing AutoML systems by ensembling the top-performing models simultaneously generated by a set of them. Our work shows that diversity in AutoML systems is sufficient to justify ensembling at the AutoML system level. In demonstrating this, we also establish a new state of the art AutoML result on the OpenML classification challenge.}\n}\n\n
\n
\n\n\n
\n The continuing rise in the number of problems amenable to machine learning solutions, coupled with simultaneous growth in both computing power and variety of machine learning techniques has led to an explosion of interest in automated machine learning (AutoML). This paper presents Ensemble Squared (Ensemble2), a \"meta\" AutoML system that ensembles at the level of AutoML systems. Ensemble2 exploits the diversity of existing, competing AutoML systems by ensembling the top-performing models simultaneously generated by a set of them. Our work shows that diversity in AutoML systems is sufficient to justify ensembling at the AutoML system level. In demonstrating this, we also establish a new state of the art AutoML result on the OpenML classification challenge.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Uncertainty in Neural Processes.\n \n \n \n \n\n\n \n Naderiparizi, S.; Chiu, K.; Bloem-Reddy, B.; and Wood, F.\n\n\n \n\n\n\n 2020.\n \n\n\n\n
\n\n\n\n \n \n \"Uncertainty arxiv\n  \n \n \n \"Uncertainty paper\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@unpublished{NAD-20a,\n  title={Uncertainty in Neural Processes}, \n  author={Saeid Naderiparizi and Kenny Chiu and Benjamin Bloem-Reddy and Frank Wood},\n  journal={arXiv preprint arXiv:1906.05462},\n  year={2020},\n  eid = {arXiv:2010.03753},\n  archivePrefix = {arXiv},\n  eprint = {2010.03753},\n  url_ArXiv={https://arxiv.org/abs/2010.03753},\n  url_Paper={https://arxiv.org/pdf/2010.03753.pdf},\n  support = {D3M,ETALUMIS},\n  abstract={We explore the effects of architecture and training objective choice on amortized posterior predictive inference in probabilistic conditional generative models. We aim this work to be a counterpoint to a recent trend in the literature that stresses achieving good samples when the amount of conditioning data is large. We instead focus our attention on the case where the amount of conditioning data is small. We highlight specific architecture and objective choices that we find lead to qualitative and quantitative improvement to posterior inference in this low data regime. Specifically we explore the effects of choices of pooling operator and variational family on posterior quality in neural processes. Superior posterior predictive samples drawn from our novel neural process architectures are demonstrated via image completion/in-painting experiments.}\n}\n\n
\n
\n\n\n
\n We explore the effects of architecture and training objective choice on amortized posterior predictive inference in probabilistic conditional generative models. We aim this work to be a counterpoint to a recent trend in the literature that stresses achieving good samples when the amount of conditioning data is large. We instead focus our attention on the case where the amount of conditioning data is small. We highlight specific architecture and objective choices that we find lead to qualitative and quantitative improvement to posterior inference in this low data regime. Specifically we explore the effects of choices of pooling operator and variational family on posterior quality in neural processes. Superior posterior predictive samples drawn from our novel neural process architectures are demonstrated via image completion/in-painting experiments.\n
\n\n\n
\n\n\n\n\n\n
\n
\n\n\n\n\n
\n
\n\n
\n
\n  \n 2019\n \n \n (3)\n \n \n
\n
\n \n \n
\n
\n  \n inproceedings\n \n \n (11)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n \n Coping With Simulators That Don’t Always Return.\n \n \n \n \n\n\n \n Warrington, A; Naderiparizi, S; and Wood, F\n\n\n \n\n\n\n In 2nd Symposium on Advances in Approximate Bayesian Inference (AABI), 2019. \n \n\n\n\n
\n\n\n\n \n \n \"Coping link\n  \n \n \n \"Coping paper\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n \n \n\n\n\n
\n
@inproceedings{WAR-19a,\n  title={Coping With Simulators That Don’t Always Return},\n  author={Warrington, A and Naderiparizi, S and Wood, F},\n  booktitle={2nd Symposium on Advances in Approximate Bayesian Inference (AABI)},\n  year={2019},\n  url_Link={https://openreview.net/forum?id=SJecKyhEKr&noteId=SJecKyhEKr},\n  url_Paper={https://openreview.net/pdf?id=SJecKyhEKr},\n  keywords = {simulators, smc, autoregressive flow},\n  support = {D3M,ETALUMIS},\n  abstract = {Deterministic models are approximations of reality that are often easier to build and interpret than stochastic alternatives.  \nUnfortunately, as nature is capricious, observational data can never be fully explained by deterministic models in practice.  \nObservation and process noise need to be added to adapt deterministic models to behave stochastically, such that they are capable of explaining and extrapolating from noisy data.\nAdding process noise to deterministic simulators can induce a failure in the simulator resulting in no return value for certain inputs -- a property we describe as ``brittle.''\nWe investigate and address the wasted computation that arises from these failures, and the effect of such failures on downstream inference tasks.\nWe show that performing inference in this space can be viewed as rejection sampling, and train a conditional normalizing flow as a proposal over noise values such that there is a low probability that the simulator crashes, increasing computational efficiency and inference fidelity for a fixed sample budget when used as the proposal in an approximate inference algorithm.}\n}\n\n
\n
\n\n\n
\n Deterministic models are approximations of reality that are often easier to build and interpret than stochastic alternatives. Unfortunately, as nature is capricious, observational data can never be fully explained by deterministic models in practice. Observation and process noise need to be added to adapt deterministic models to behave stochastically, such that they are capable of explaining and extrapolating from noisy data. Adding process noise to deterministic simulators can induce a failure in the simulator resulting in no return value for certain inputs – a property we describe as ``brittle.'' We investigate and address the wasted computation that arises from these failures, and the effect of such failures on downstream inference tasks. We show that performing inference in this space can be viewed as rejection sampling, and train a conditional normalizing flow as a proposal over noise values such that there is a low probability that the simulator crashes, increasing computational efficiency and inference fidelity for a fixed sample budget when used as the proposal in an approximate inference algorithm.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Near-Optimal Glimpse Sequences for Improved Hard Attention Neural Network Training.\n \n \n \n \n\n\n \n Harvey, W.; Teng, M.; and Wood, F.\n\n\n \n\n\n\n In NeurIPS Workshop on Bayesian Deep Learning, 2019. \n \n\n\n\n
\n\n\n\n \n \n \"Near-Optimal paper\n  \n \n \n \"Near-Optimal arxiv\n  \n \n \n \"Near-Optimal poster\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 3 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{HAR-19,\n  title={Near-Optimal Glimpse Sequences for Improved Hard Attention Neural Network Training},\n  author={Harvey, William and Teng, Michael and Wood, Frank},\n  booktitle={NeurIPS Workshop on Bayesian Deep Learning},\n  year={2019},\n  support = {D3M,LwLL},\n  archiveprefix = {arXiv},\n  eprint = {1906.05462},\n  url_Paper={http://bayesiandeeplearning.org/2019/papers/38.pdf},\n  url_ArXiv={https://arxiv.org/abs/1906.05462},\n  url_Poster={https://github.com/plai-group/bibliography/blob/master/presentations_posters/HAR-19.pdf},\n  abstract = {We introduce the use of Bayesian optimal experimental design techniques for generating glimpse sequences to use in semi-supervised training of hard attention networks. Hard attention holds the promise of greater energy efficiency and superior inference performance. Employing such networks for image classification usually involves choosing a sequence of glimpse locations from a stochastic policy. As the outputs of observations are typically non-differentiable with respect to their glimpse locations, unsupervised gradient learning of such a policy requires REINFORCE-style updates. Also, the only reward signal is the final classification accuracy. For these reasons hard attention networks, despite their promise, have not achieved the wide adoption that soft attention networks have and, in many practical settings, are difficult to train. We find that our method for semi-supervised training makes it easier and faster to train hard attention networks and correspondingly could make them practical to consider in situations where they were not before.},\n}\n\n
\n
\n\n\n
\n We introduce the use of Bayesian optimal experimental design techniques for generating glimpse sequences to use in semi-supervised training of hard attention networks. Hard attention holds the promise of greater energy efficiency and superior inference performance. Employing such networks for image classification usually involves choosing a sequence of glimpse locations from a stochastic policy. As the outputs of observations are typically non-differentiable with respect to their glimpse locations, unsupervised gradient learning of such a policy requires REINFORCE-style updates. Also, the only reward signal is the final classification accuracy. For these reasons hard attention networks, despite their promise, have not achieved the wide adoption that soft attention networks have and, in many practical settings, are difficult to train. We find that our method for semi-supervised training makes it easier and faster to train hard attention networks and correspondingly could make them practical to consider in situations where they were not before.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Efficient Inference Amortization in Graphical Models using Structured Continuous Conditional Normalizing Flows.\n \n \n \n \n\n\n \n Weilbach, C.; Beronov, B.; Harvey, W.; and Wood, F.\n\n\n \n\n\n\n In 2nd Symposium on Advances in Approximate Bayesian Inference (AABI), 2019. \n \n\n\n\n
\n\n\n\n \n \n \"Efficient link\n  \n \n \n \"Efficient paper\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{WEI-19,\n  title={Efficient Inference Amortization in Graphical Models using Structured Continuous Conditional Normalizing Flows},\n  author={Weilbach, Christian and Beronov, Boyan and Harvey, William and Wood, Frank},\n  booktitle={2nd Symposium on Advances in Approximate Bayesian Inference (AABI)},\n  support = {D3M},\n  url_Link={https://openreview.net/forum?id=BJlhYknNFS},\n  url_Paper={https://openreview.net/pdf?id=BJlhYknNFS},\n  abstract = {We introduce a more efficient neural architecture for amortized inference, which combines continuous and conditional normalizing flows using a principled choice of structure. Our gradient flow derives its sparsity pattern from the minimally faithful inverse of its underlying graphical model. We find that this factorization reduces the necessary numbers both of parameters in the neural network and of adaptive integration steps in the ODE solver. Consequently, the throughput at training time and inference time is increased, without decreasing performance in comparison to unconstrained flows. By expressing the structural inversion and the flow construction as compilation passes of a probabilistic programming language, we demonstrate their applicability to the stochastic inversion of realistic models such as convolutional neural networks (CNN).},\n  year={2019}\n}\n\n
\n
\n\n\n
\n We introduce a more efficient neural architecture for amortized inference, which combines continuous and conditional normalizing flows using a principled choice of structure. Our gradient flow derives its sparsity pattern from the minimally faithful inverse of its underlying graphical model. We find that this factorization reduces the necessary numbers both of parameters in the neural network and of adaptive integration steps in the ODE solver. Consequently, the throughput at training time and inference time is increased, without decreasing performance in comparison to unconstrained flows. By expressing the structural inversion and the flow construction as compilation passes of a probabilistic programming language, we demonstrate their applicability to the stochastic inversion of realistic models such as convolutional neural networks (CNN).\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Sparse Variational Inference: Bayesian Coresets from Scratch.\n \n \n \n \n\n\n \n Campbell, T.; and Beronov, B.\n\n\n \n\n\n\n In Conference on Neural Information Processing Systems (NeurIPS), pages 11457–11468, 2019. \n \n\n1st prize, Student poster competition, AICan (Annual Meeting, Pan-Canadian AI Strategy, Canadian Institute for Advanced Research). Vancouver, Canada, Dec. 9, 2019\n\n
\n\n\n\n \n \n \"Sparse link\n  \n \n \n \"Sparse paper\n  \n \n \n \"Sparse poster\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 2 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{CAM-19,\n  title={Sparse Variational Inference: Bayesian Coresets from Scratch},\n  author={Campbell, Trevor and Beronov, Boyan},\n  booktitle={Conference on Neural Information Processing Systems (NeurIPS)},\n  pages={11457--11468},\n  year={2019},\n  eid = {arXiv:1906.03329},\n  archivePrefix = {arXiv},\n  eprint = {1906.03329},\n  support = {D3M},\n  url_Link={http://papers.nips.cc/paper/9322-sparse-variational-inference-bayesian-coresets-from-scratch},\n  url_Paper={http://papers.nips.cc/paper/9322-sparse-variational-inference-bayesian-coresets-from-scratch.pdf},\n  url_Poster={https://github.com/plai-group/bibliography/raw/master/presentations_posters/CAM-19.pdf},\n  bibbase_note={1st prize, Student poster competition, AICan (Annual Meeting, Pan-Canadian AI Strategy, Canadian Institute for Advanced Research). Vancouver, Canada, Dec. 9, 2019},\n    abstract={The proliferation of automated inference algorithms in Bayesian statistics has provided practitioners newfound access to fast, reproducible data analysis and powerful statistical models. Designing automated methods that are also both computationally scalable and theoretically sound, however, remains a significant challenge. Recent work on Bayesian coresets takes the approach of compressing the dataset before running a standard inference algorithm, providing both scalability and guarantees on posterior approximation error. But the automation of past coreset methods is limited because they depend on the availability of a reasonable coarse posterior approximation, which is difficult to specify in practice. In the present work we remove this requirement by formulating coreset construction as sparsity-constrained variational inference within an exponential family. This perspective leads to a novel construction via greedy optimization, and also provides a unifying information-geometric view of present and past methods. The proposed Riemannian coreset construction algorithm is fully automated, requiring no problem-specific inputs aside from the probabilistic model and dataset. In addition to being significantly easier to use than past methods, experiments demonstrate that past coreset constructions are fundamentally limited by the fixed coarse posterior approximation; in contrast, the proposed algorithm is able to continually improve the coreset, providing state-of-the-art Bayesian dataset summarization with orders-of-magnitude reduction in KL divergence to the exact posterior.}\n}\n\n
\n
\n\n\n
\n The proliferation of automated inference algorithms in Bayesian statistics has provided practitioners newfound access to fast, reproducible data analysis and powerful statistical models. Designing automated methods that are also both computationally scalable and theoretically sound, however, remains a significant challenge. Recent work on Bayesian coresets takes the approach of compressing the dataset before running a standard inference algorithm, providing both scalability and guarantees on posterior approximation error. But the automation of past coreset methods is limited because they depend on the availability of a reasonable coarse posterior approximation, which is difficult to specify in practice. In the present work we remove this requirement by formulating coreset construction as sparsity-constrained variational inference within an exponential family. This perspective leads to a novel construction via greedy optimization, and also provides a unifying information-geometric view of present and past methods. The proposed Riemannian coreset construction algorithm is fully automated, requiring no problem-specific inputs aside from the probabilistic model and dataset. In addition to being significantly easier to use than past methods, experiments demonstrate that past coreset constructions are fundamentally limited by the fixed coarse posterior approximation; in contrast, the proposed algorithm is able to continually improve the coreset, providing state-of-the-art Bayesian dataset summarization with orders-of-magnitude reduction in KL divergence to the exact posterior.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Efficient Bayesian Inference for Nested Simulators.\n \n \n \n \n\n\n \n Gram-Hansen, B; Schroeder de Witt, C; Zinkov, R; Naderiparizi, S; Scibior, A; Munk, A; Wood, F; Ghadiri, M; Torr, P; Whye Teh, Y; Gunes Baydin, A; and Rainforth, T\n\n\n \n\n\n\n In 2nd Symposium on Advances in Approximate Bayesian Inference (AABI), 2019. \n \n\n\n\n
\n\n\n\n \n \n \"Efficient link\n  \n \n \n \"Efficient paper\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{GRA-19,\n  title={Efficient Bayesian Inference for Nested Simulators},\n  author={Gram-Hansen, B and Schroeder de Witt, C and Zinkov, R and Naderiparizi, S and Scibior, A and Munk, A and Wood, F and Ghadiri, M and Torr, P and Whye Teh, Y and Gunes Baydin, A and Rainforth, T},\n  booktitle={2nd Symposium on Advances in Approximate Bayesian Inference (AABI)},\n  year={2019},\n  support = {D3M},\n  url_Link={https://openreview.net/forum?id=rJeMcy2EtH},\n  url_Paper={https://openreview.net/pdf?id=rJeMcy2EtH},\n  abstact={We introduce two approaches for conducting efficient Bayesian inference in stochastic simulators containing nested stochastic sub-procedures, i.e., internal procedures for which the density cannot be calculated directly such as rejection sampling loops. The resulting class of simulators are used extensively throughout the sciences and can be interpreted as probabilistic generative models. However, drawing inferences from them poses a substantial challenge due to the inability to evaluate even their unnormalised density, preventing the use of many standard inference procedures like Markov Chain Monte Carlo (MCMC). To address this, we introduce inference algorithms based on a two-step approach that first approximates the conditional densities of the individual sub-procedures, before using these approximations to run MCMC methods on the full program. Because the sub-procedures can be dealt with separately and are lower-dimensional than that of the overall problem, this two-step process allows them to be isolated and thus be tractably dealt with, without placing restrictions on the overall dimensionality of the problem. We demonstrate the utility of our approach on a simple, artificially constructed simulator.}\n}\n\n
\n
\n\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n The Thermodynamic Variational Objective.\n \n \n \n \n\n\n \n Masrani, V.; Le, T. A.; and Wood, F.\n\n\n \n\n\n\n In Thirty-third Conference on Neural Information Processing Systems (NeurIPS), 2019. \n \n\n\n\n
\n\n\n\n \n \n \"The paper\n  \n \n \n \"The arxiv\n  \n \n \n \"The poster\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 2 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{MAS-19,\n  title={The Thermodynamic Variational Objective},\n  author={Masrani, Vaden and Le, Tuan Anh and Wood, Frank},\n  booktitle={Thirty-third Conference on Neural Information Processing Systems (NeurIPS)},\n  archiveprefix = {arXiv},\n  eprint = {1907.00031},\n  url_Paper={https://arxiv.org/pdf/1907.00031.pdf},\n  url_ArXiv={https://arxiv.org/abs/1907.00031},\n  url_Poster={https://github.com/plai-group/bibliography/blob/master/presentations_posters/neurips_tvo_poster.pdf},\n  support = {D3M},\n  abstract={We introduce the thermodynamic variational objective (TVO) for learning in both continuous and discrete deep generative models. The TVO arises from a key connection between variational inference and thermodynamic integration that results in a tighter lower bound to the log marginal likelihood than the standard variational variational evidence lower bound (ELBO) while remaining as broadly applicable. We provide a computationally efficient gradient estimator for the TVO that applies to continuous, discrete, and non-reparameterizable distributions and show that the objective functions used in variational inference, variational autoencoders, wake sleep, and inference compilation are all special cases of the TVO. We use the TVO to learn both discrete and continuous deep generative models and empirically demonstrate state of the art model and inference network learning.},\n  year={2019}\n}\n\n\n
\n
\n\n\n
\n We introduce the thermodynamic variational objective (TVO) for learning in both continuous and discrete deep generative models. The TVO arises from a key connection between variational inference and thermodynamic integration that results in a tighter lower bound to the log marginal likelihood than the standard variational variational evidence lower bound (ELBO) while remaining as broadly applicable. We provide a computationally efficient gradient estimator for the TVO that applies to continuous, discrete, and non-reparameterizable distributions and show that the objective functions used in variational inference, variational autoencoders, wake sleep, and inference compilation are all special cases of the TVO. We use the TVO to learn both discrete and continuous deep generative models and empirically demonstrate state of the art model and inference network learning.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Etalumis: Bringing Probabilistic Programming to Scientific Simulators at Scale.\n \n \n \n \n\n\n \n Baydin, A. G.; Shao, L.; Bhimji, W.; Heinrich, L.; Meadows, L.; Liu, J.; Munk, A.; Naderiparizi, S.; Gram-Hansen, B.; Louppe, G.; and others\n\n\n \n\n\n\n In the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’19), 2019. \n \n\n\n\n
\n\n\n\n \n \n \"Etalumis: paper\n  \n \n \n \"Etalumis: arxiv\n  \n \n\n \n \n doi\n  \n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{BAY-19,\n  title={Etalumis: Bringing Probabilistic Programming to Scientific Simulators at Scale},\n  author={Baydin, At{\\i}l{\\i}m G{\\"u}ne{\\c{s}} and Shao, Lei and Bhimji, Wahid and Heinrich, Lukas and Meadows, Lawrence and Liu, Jialin and Munk, Andreas and Naderiparizi, Saeid and Gram-Hansen, Bradley and Louppe, Gilles and others},\n  booktitle={the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’19)},\n  archiveprefix = {arXiv},\n  eprint = {1907.03382},\n  support = {D3M,ETALUMIS},\n  url_Paper={https://arxiv.org/pdf/1907.03382.pdf},\n  url_ArXiv={https://arxiv.org/abs/1907.03382},\n  abstract={Probabilistic programming languages (PPLs) are receiving widespread attention for performing Bayesian inference in complex generative models. However, applications to science remain limited because of the impracticability of rewriting complex scientific simulators in a PPL, the computational cost of inference, and the lack of scalable implementations. To address these, we present a novel PPL framework that couples directly to existing scientific simulators through a cross-platform probabilistic execution protocol and provides Markov chain Monte Carlo (MCMC) and deep-learning-based inference compilation (IC) engines for tractable inference. To guide IC inference, we perform distributed training of a dynamic 3DCNN--LSTM architecture with a PyTorch-MPI-based framework on 1,024 32-core CPU nodes of the Cori supercomputer with a global minibatch size of 128k: achieving a performance of 450 Tflop/s through enhancements to PyTorch. We demonstrate a Large Hadron Collider (LHC) use-case with the C++ Sherpa simulator and achieve the largest-scale posterior inference in a Turing-complete PPL.},\n  year={2019},\n  doi={10.1145/3295500.3356180}\n}\n\n
\n
\n\n\n
\n Probabilistic programming languages (PPLs) are receiving widespread attention for performing Bayesian inference in complex generative models. However, applications to science remain limited because of the impracticability of rewriting complex scientific simulators in a PPL, the computational cost of inference, and the lack of scalable implementations. To address these, we present a novel PPL framework that couples directly to existing scientific simulators through a cross-platform probabilistic execution protocol and provides Markov chain Monte Carlo (MCMC) and deep-learning-based inference compilation (IC) engines for tractable inference. To guide IC inference, we perform distributed training of a dynamic 3DCNN–LSTM architecture with a PyTorch-MPI-based framework on 1,024 32-core CPU nodes of the Cori supercomputer with a global minibatch size of 128k: achieving a performance of 450 Tflop/s through enhancements to PyTorch. We demonstrate a Large Hadron Collider (LHC) use-case with the C++ Sherpa simulator and achieve the largest-scale posterior inference in a Turing-complete PPL.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n The Virtual Patch Clamp: Imputing C. elegans Membrane Potentials from Calcium Imaging.\n \n \n \n \n\n\n \n Warrington, A.; Spencer, A.; and Wood, F.\n\n\n \n\n\n\n In NeurIPS 2019 Workshop Neuro AI, 2019. \n \n\n\n\n
\n\n\n\n \n \n \"The paper\n  \n \n \n \"The arxiv\n  \n \n \n \"The poster\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{WAR-19,\n  title={The Virtual Patch Clamp: Imputing C. elegans Membrane Potentials from Calcium Imaging},\n  author={Warrington, Andrew and Spencer, Arthur and Wood, Frank},\n  booktitle={NeurIPS 2019 Workshop Neuro AI},\n  archiveprefix = {arXiv},\n  eprint = {1907.11075},\n  support = {D3M},\n  url_Paper={https://arxiv.org/pdf/1907.11075.pdf},\n  url_ArXiv={https://arxiv.org/abs/1907.11075},\n  url_Poster={https://github.com/plai-group/bibliography/blob/master/presentations_posters/WAR-19.pdf},\n  abstract={We develop a stochastic whole-brain and body simulator of the nematode roundworm Caenorhabditis elegans (C. elegans) and show that it is sufficiently regularizing to allow imputation of latent membrane potentials from partial calcium fluorescence imaging observations. This is the first attempt we know of to "complete the circle," where an anatomically grounded whole-connectome simulator is used to impute a time-varying "brain" state at single-cell fidelity from covariates that are measurable in practice. The sequential Monte Carlo (SMC) method we employ not only enables imputation of said latent states but also presents a strategy for learning simulator parameters via variational optimization of the noisy model evidence approximation provided by SMC. Our imputation and parameter estimation experiments were conducted on distributed systems using novel implementations of the aforementioned techniques applied to synthetic data of dimension and type representative of that which are measured in laboratories currently.},\n  year={2019}\n}\n\n
\n
\n\n\n
\n We develop a stochastic whole-brain and body simulator of the nematode roundworm Caenorhabditis elegans (C. elegans) and show that it is sufficiently regularizing to allow imputation of latent membrane potentials from partial calcium fluorescence imaging observations. This is the first attempt we know of to \"complete the circle,\" where an anatomically grounded whole-connectome simulator is used to impute a time-varying \"brain\" state at single-cell fidelity from covariates that are measurable in practice. The sequential Monte Carlo (SMC) method we employ not only enables imputation of said latent states but also presents a strategy for learning simulator parameters via variational optimization of the noisy model evidence approximation provided by SMC. Our imputation and parameter estimation experiments were conducted on distributed systems using novel implementations of the aforementioned techniques applied to synthetic data of dimension and type representative of that which are measured in laboratories currently.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Amortized Monte Carlo Integration.\n \n \n \n \n\n\n \n Goliński, A.; Wood, F.; and Rainforth, T.\n\n\n \n\n\n\n In Proceedings of the International Conference on Machine Learning (ICML), 2019. \n \n\n\n\n
\n\n\n\n \n \n \"Amortized paper\n  \n \n \n \"Amortized arxiv\n  \n \n \n \"Amortized presentation\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{GOL-19,\n  title={Amortized Monte Carlo Integration},\n  author={Goli{\\'n}ski, Adam and Wood, Frank and Rainforth, Tom},\n  booktitle={Proceedings of the International Conference on Machine Learning (ICML)},\n  year={2019},\n  archiveprefix = {arXiv},\n  eprint = {1907.08082},\n  url_Paper={https://arxiv.org/pdf/1907.08082.pdf},\n  url_ArXiv={https://arxiv.org/abs/1907.08082},\n  url_Presentation={https://icml.cc/Conferences/2019/ScheduleMultitrack?event=4702},\n  support = {D3M},\n  abstract={Current approaches to amortizing Bayesian inference focus solely on approximating the posterior distribution. Typically, this approximation is, in turn, used to calculate expectations for one or more target functions - a computational pipeline which is inefficient when the target function(s) are known upfront. In this paper, we address this inefficiency by introducing AMCI, a method for amortizing Monte Carlo integration directly. AMCI operates similarly to amortized inference but produces three distinct amortized proposals, each tailored to a different component of the overall expectation calculation. At runtime, samples are produced separately from each amortized proposal, before being combined to an overall estimate of the expectation. We show that while existing approaches are fundamentally limited in the level of accuracy they can achieve, AMCI can theoretically produce arbitrarily small errors for any integrable target function using only a single sample from each proposal at runtime. We further show that it is able to empirically outperform the theoretically optimal self-normalized importance sampler on a number of example problems. Furthermore, AMCI allows not only for amortizing over datasets but also amortizing over target functions.}\n}\n\n\n
\n
\n\n\n
\n Current approaches to amortizing Bayesian inference focus solely on approximating the posterior distribution. Typically, this approximation is, in turn, used to calculate expectations for one or more target functions - a computational pipeline which is inefficient when the target function(s) are known upfront. In this paper, we address this inefficiency by introducing AMCI, a method for amortizing Monte Carlo integration directly. AMCI operates similarly to amortized inference but produces three distinct amortized proposals, each tailored to a different component of the overall expectation calculation. At runtime, samples are produced separately from each amortized proposal, before being combined to an overall estimate of the expectation. We show that while existing approaches are fundamentally limited in the level of accuracy they can achieve, AMCI can theoretically produce arbitrarily small errors for any integrable target function using only a single sample from each proposal at runtime. We further show that it is able to empirically outperform the theoretically optimal self-normalized importance sampler on a number of example problems. Furthermore, AMCI allows not only for amortizing over datasets but also amortizing over target functions.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n LF-PPL: A Low-Level First Order Probabilistic Programming Language for Non-Differentiable Models.\n \n \n \n \n\n\n \n Zhou, Y.; Gram-Hansen, B. J; Kohn, T.; Rainforth, T.; Yang, H.; and Wood, F.\n\n\n \n\n\n\n In Proceedings of the Twentieth International Conference on Artificial Intelligence and Statistics (AISTATS), 2019. \n \n\n\n\n
\n\n\n\n \n \n \"LF-PPL: paper\n  \n \n \n \"LF-PPL: arxiv\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{ZHO-19,\n  title={{LF-PPL}: A Low-Level First Order Probabilistic Programming Language for Non-Differentiable Models},\n  author={Zhou, Yuan and Gram-Hansen, Bradley J and Kohn, Tobias and Rainforth, Tom and Yang, Hongseok and Wood, Frank},\n  booktitle={Proceedings of the Twentieth International Conference on Artificial Intelligence and Statistics (AISTATS)},\n  year={2019},\n  archiveprefix = {arXiv},\n  eprint = {1903.02482},\n  support = {D3M},\n  url_Paper={https://arxiv.org/pdf/1903.02482.pdf},\n  url_ArXiv={https://arxiv.org/abs/1903.02482},\n  abstract={We develop a new Low-level, First-order Probabilistic Programming Language (LF-PPL) suited for models containing a mix of continuous, discrete, and/or piecewise-continuous variables. The key success of this language and its compilation scheme is in its ability to automatically distinguish parameters the density function is discontinuous with respect to, while further providing runtime checks for boundary crossings. This enables the introduction of new inference engines that are able to exploit gradient information, while remaining efficient for models which are not everywhere differentiable. We demonstrate this ability by incorporating a discontinuous Hamiltonian Monte Carlo (DHMC) inference engine that is able to deliver automated and efficient inference for non-differentiable models. Our system is backed up by a mathematical formalism that ensures that any model expressed in this language has a density with measure zero discontinuities to maintain the validity of the inference engine.}\n}\n\n
\n
\n\n\n
\n We develop a new Low-level, First-order Probabilistic Programming Language (LF-PPL) suited for models containing a mix of continuous, discrete, and/or piecewise-continuous variables. The key success of this language and its compilation scheme is in its ability to automatically distinguish parameters the density function is discontinuous with respect to, while further providing runtime checks for boundary crossings. This enables the introduction of new inference engines that are able to exploit gradient information, while remaining efficient for models which are not everywhere differentiable. We demonstrate this ability by incorporating a discontinuous Hamiltonian Monte Carlo (DHMC) inference engine that is able to deliver automated and efficient inference for non-differentiable models. Our system is backed up by a mathematical formalism that ensures that any model expressed in this language has a density with measure zero discontinuities to maintain the validity of the inference engine.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Efficient Probabilistic Inference in the Quest for Physics Beyond the Standard Model.\n \n \n \n \n\n\n \n Baydin, A. G.; Heinrich, L.; Bhimji, W.; Gram-Hansen, B.; Louppe, G.; Shao, L.; Cranmer, K.; Wood, F.; and others\n\n\n \n\n\n\n In Thirty-second Conference on Neural Information Processing Systems (NeurIPS), 2019. \n \n\n\n\n
\n\n\n\n \n \n \"Efficient link\n  \n \n \n \"Efficient paper\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{BAY-19a,\n  title={Efficient Probabilistic Inference in the Quest for Physics Beyond the Standard Model},\n  author={Baydin, Atilim Gunes and Heinrich, Lukas and Bhimji, Wahid and Gram-Hansen, Bradley and Louppe, Gilles and Shao, Lei and Cranmer, Kyle and Wood, Frank and others},\n  booktitle={Thirty-second Conference on Neural Information Processing Systems (NeurIPS)},\n  archiveprefix = {arXiv},\n  eprint = {1807.07706},\n  year={2019},\n  url_Link={https://papers.nips.cc/paper/8785-efficient-probabilistic-inference-in-the-quest-for-physics-beyond-the-standard-model},\n  url_Link={https://arxiv.org/abs/1807.07706},\n  url_Paper={https://papers.nips.cc/paper/8785-efficient-probabilistic-inference-in-the-quest-for-physics-beyond-the-standard-model.pdf},\n  url_Paper={https://arxiv.org/pdf/1807.07706.pdf},\n  abstract={We present a novel probabilistic programming framework that couples directly to existing large-scale simulators through a cross-platform probabilistic execution protocol, which allows general-purpose inference engines to record and control random number draws within simulators in a language-agnostic way. The execution of existing simulators as probabilistic programs enables highly interpretable posterior inference in the structured model defined by the simulator code base. We demonstrate the technique in particle physics, on a scientifically accurate simulation of the tau lepton decay, which is a key ingredient in establishing the properties of the Higgs boson. Inference efficiency is achieved via inference compilation where a deep recurrent neural network is trained to parameterize proposal distributions and control the stochastic simulator in a sequential importance sampling scheme, at a fraction of the computational cost of a Markov chain Monte Carlo baseline.}\n}\n\n\n
\n
\n\n\n
\n We present a novel probabilistic programming framework that couples directly to existing large-scale simulators through a cross-platform probabilistic execution protocol, which allows general-purpose inference engines to record and control random number draws within simulators in a language-agnostic way. The execution of existing simulators as probabilistic programs enables highly interpretable posterior inference in the structured model defined by the simulator code base. We demonstrate the technique in particle physics, on a scientifically accurate simulation of the tau lepton decay, which is a key ingredient in establishing the properties of the Higgs boson. Inference efficiency is achieved via inference compilation where a deep recurrent neural network is trained to parameterize proposal distributions and control the stochastic simulator in a sequential importance sampling scheme, at a fraction of the computational cost of a Markov chain Monte Carlo baseline.\n
\n\n\n
\n\n\n\n\n\n
\n
\n\n
\n
\n  \n techreport\n \n \n (1)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n \n Hasty-A Generative Model Complier.\n \n \n \n \n\n\n \n Wood, F.; Teng, M.; and Zinkov, R.\n\n\n \n\n\n\n Technical Report University of Oxford Oxford United Kingdom, 2019.\n \n\n\n\n
\n\n\n\n \n \n \"Hasty-A link\n  \n \n \n \"Hasty-A paper\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 6 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@techreport{WOO-19,\n  title={Hasty-A Generative Model Complier},\n  author={Wood, Frank and Teng, Michael and Zinkov, Rob},\n  year={2019},\n  institution={University of Oxford Oxford United Kingdom},\n  url_Link={https://apps.dtic.mil/sti/citations/AD1072839},\n  url_Paper={https://apps.dtic.mil/sti/pdfs/AD1072839.pdf},\n  support = {D3M},\n  abstract = {This work describes our contribution of proof of concept primitives to the D3M program and research progress made towards an initial version of Hasty. Although we were unable to complete the initial version of Hasty, or contribute to the D3M primitive library the types of primitives that Hasty will enable we did train a number of Highly Qualified Personnel HQP and have interacted with the AutoML, probabilistic programming languages, neural networking, and other communities which our work is expected to impact.}\n}\n\n
\n
\n\n\n
\n This work describes our contribution of proof of concept primitives to the D3M program and research progress made towards an initial version of Hasty. Although we were unable to complete the initial version of Hasty, or contribute to the D3M primitive library the types of primitives that Hasty will enable we did train a number of Highly Qualified Personnel HQP and have interacted with the AutoML, probabilistic programming languages, neural networking, and other communities which our work is expected to impact.\n
\n\n\n
\n\n\n\n\n\n
\n
\n\n
\n
\n  \n unpublished\n \n \n (1)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n \n Imitation Learning of Factored Multi-agent Reactive Models.\n \n \n \n \n\n\n \n Teng, M.; Le, T. A.; Scibior, A.; and Wood, F.\n\n\n \n\n\n\n 2019.\n \n\n\n\n
\n\n\n\n \n \n \"Imitation paper\n  \n \n \n \"Imitation arxiv\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@unpublished{TEN-19,\n  title={Imitation Learning of Factored Multi-agent Reactive Models},\n  author={Teng, Michael and Le, Tuan Anh and Scibior, Adam and Wood, Frank},\n  archiveprefix = {arXiv},\n  eprint = {1903.04714},\n  year={2019},\n  url_Paper={https://arxiv.org/pdf/1903.04714.pdf},\n  url_ArXiv={https://arxiv.org/abs/1903.04714},\n  support = {D3M},\n  abstract={We apply recent advances in deep generative modeling to the task of imitation learning from biological agents. Specifically, we apply variations of the variational recurrent neural network model to a multi-agent setting where we learn policies of individual uncoordinated agents acting based on their perceptual inputs and their hidden belief state. We learn stochastic policies for these agents directly from observational data, without constructing a reward function. An inference network learned jointly with the policy allows for efficient inference over the agent's belief state given a sequence of its current perceptual inputs and the prior actions it performed, which lets us extrapolate observed sequences of behavior into the future while maintaining uncertainty estimates over future trajectories. We test our approach on a dataset of flies interacting in a 2D environment, where we demonstrate better predictive performance than existing approaches which learn deterministic policies with recurrent neural networks. We further show that the uncertainty estimates over future trajectories we obtain are well calibrated, which makes them useful for a variety of downstream processing tasks.},\n}\n\n
\n
\n\n\n
\n We apply recent advances in deep generative modeling to the task of imitation learning from biological agents. Specifically, we apply variations of the variational recurrent neural network model to a multi-agent setting where we learn policies of individual uncoordinated agents acting based on their perceptual inputs and their hidden belief state. We learn stochastic policies for these agents directly from observational data, without constructing a reward function. An inference network learned jointly with the policy allows for efficient inference over the agent's belief state given a sequence of its current perceptual inputs and the prior actions it performed, which lets us extrapolate observed sequences of behavior into the future while maintaining uncertainty estimates over future trajectories. We test our approach on a dataset of flies interacting in a 2D environment, where we demonstrate better predictive performance than existing approaches which learn deterministic policies with recurrent neural networks. We further show that the uncertainty estimates over future trajectories we obtain are well calibrated, which makes them useful for a variety of downstream processing tasks.\n
\n\n\n
\n\n\n\n\n\n
\n
\n\n\n\n\n
\n
\n\n
\n
\n  \n 2018\n \n \n (2)\n \n \n
\n
\n \n \n
\n
\n  \n inproceedings\n \n \n (7)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n \n Bayesian Distributed Stochastic Gradient Descent.\n \n \n \n \n\n\n \n Teng, M.; and Wood, F.\n\n\n \n\n\n\n In Advances in Neural Information Processing Systems 31, pages 6378–6388, 2018. \n \n\n\n\n
\n\n\n\n \n \n \"Bayesian link\n  \n \n \n \"Bayesian paper\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{TEN-18,\n  title={Bayesian Distributed Stochastic Gradient Descent},\n  author={Teng, Michael and Wood, Frank},\n  booktitle={Advances in Neural Information Processing Systems 31},\n  pages={6378--6388},\n  year={2018},\n  url_Link={https://papers.nips.cc/paper/7874-bayesian-distributed-stochastic-gradient-descent},\n  url_Paper={https://papers.nips.cc/paper/7874-bayesian-distributed-stochastic-gradient-descent.pdf},\n  abstract={We introduce Bayesian distributed stochastic gradient descent (BDSGD), a high-throughput algorithm for training deep neural networks on parallel clusters. This algorithm uses amortized inference in a deep generative model to perform joint posterior predictive inference of mini-batch gradient computation times in a compute cluster specific manner. Specifically, our algorithm mitigates the straggler effect in synchronous, gradient-based optimization by choosing an optimal cutoff beyond which mini-batch gradient messages from slow workers are ignored. In our experiments, we show that eagerly discarding the mini-batch gradient computations of stragglers not only increases throughput but actually increases the overall rate of convergence as a function of wall-clock time by virtue of eliminating idleness. The principal novel contribution and finding of this work goes beyond this by demonstrating that using the predicted run-times from a generative model of cluster worker performance improves substantially over the static-cutoff prior art, leading to reduced deep neural net training times on large computer clusters.}\n}\n\n
\n
\n\n\n
\n We introduce Bayesian distributed stochastic gradient descent (BDSGD), a high-throughput algorithm for training deep neural networks on parallel clusters. This algorithm uses amortized inference in a deep generative model to perform joint posterior predictive inference of mini-batch gradient computation times in a compute cluster specific manner. Specifically, our algorithm mitigates the straggler effect in synchronous, gradient-based optimization by choosing an optimal cutoff beyond which mini-batch gradient messages from slow workers are ignored. In our experiments, we show that eagerly discarding the mini-batch gradient computations of stragglers not only increases throughput but actually increases the overall rate of convergence as a function of wall-clock time by virtue of eliminating idleness. The principal novel contribution and finding of this work goes beyond this by demonstrating that using the predicted run-times from a generative model of cluster worker performance improves substantially over the static-cutoff prior art, leading to reduced deep neural net training times on large computer clusters.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Faithful inversion of generative models for effective amortized inference.\n \n \n \n \n\n\n \n Webb, S.; Golinski, A.; Zinkov, R.; Narayanaswamy, S.; Rainforth, T.; Teh, Y. W.; and Wood, F.\n\n\n \n\n\n\n In Advances in Neural Information Processing Systems 31, pages 3070–3080, 2018. \n \n\n\n\n
\n\n\n\n \n \n \"Faithful paper\n  \n \n \n \"Faithful arxiv\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{WEB-18,\n  title={Faithful inversion of generative models for effective amortized inference},\n  author={Webb, Stefan and Golinski, Adam and Zinkov, Rob and Narayanaswamy, Siddharth and Rainforth, Tom and Teh, Yee Whye and Wood, Frank},\n  booktitle={Advances in Neural Information Processing Systems 31},\n  pages={3070--3080},\n  year={2018},\n  archiveprefix = {arXiv},\n  eprint = {1712.00287},\n  url_Paper={https://arxiv.org/pdf/1712.00287.pdf},\n  url_ArXiv={https://arxiv.org/abs/1712.00287},\n  abstract={Inference amortization methods share information across multiple posterior-inference problems, allowing each to be carried out more efficiently. Generally, they require the inversion of the dependency structure in the generative model, as the modeller must learn a mapping from observations to distributions approximating the posterior. Previous approaches have involved inverting the dependency structure in a heuristic way that fails to capture these dependencies correctly, thereby limiting the achievable accuracy of the resulting approximations. We introduce an algorithm for faithfully, and minimally, inverting the graphical model structure of any generative model. Such inverses have two crucial properties: (a) they do not encode any independence assertions that are absent from the model and; (b) they are local maxima for the number of true independencies encoded. We prove the correctness of our approach and empirically show that the resulting minimally faithful inverses lead to better inference amortization than existing heuristic approaches.}\n}\n\n\n
\n
\n\n\n
\n Inference amortization methods share information across multiple posterior-inference problems, allowing each to be carried out more efficiently. Generally, they require the inversion of the dependency structure in the generative model, as the modeller must learn a mapping from observations to distributions approximating the posterior. Previous approaches have involved inverting the dependency structure in a heuristic way that fails to capture these dependencies correctly, thereby limiting the achievable accuracy of the resulting approximations. We introduce an algorithm for faithfully, and minimally, inverting the graphical model structure of any generative model. Such inverses have two crucial properties: (a) they do not encode any independence assertions that are absent from the model and; (b) they are local maxima for the number of true independencies encoded. We prove the correctness of our approach and empirically show that the resulting minimally faithful inverses lead to better inference amortization than existing heuristic approaches.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n On Nesting Monte Carlo Estimators.\n \n \n \n \n\n\n \n Rainforth, T.; Cornish, R.; Yang, H.; Warrington, A.; and Wood, F.\n\n\n \n\n\n\n In Thirty-fifth International Conference on Machine Learning (ICML), 2018. \n \n\n\n\n
\n\n\n\n \n \n \"On paper\n  \n \n \n \"On arxiv\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{RAI-18a,\n  title={On Nesting Monte Carlo Estimators},\n  author={Rainforth, Tom and Cornish, Robert and Yang, Hongseok and Warrington, Andrew and Wood, Frank},\n  booktitle={Thirty-fifth International Conference on Machine Learning (ICML)},\n  year={2018},\n  archiveprefix = {arXiv},\n  eprint = {1709.06181},\n  url_Paper={https://arxiv.org/pdf/1709.06181.pdf},\n  url_ArXiv={https://arxiv.org/abs/1709.06181},\n  abstract={Many problems in machine learning and statistics involve nested expectations and thus do not permit conventional Monte Carlo (MC) estimation. For such problems, one must nest estimators, such that terms in an outer estimator themselves involve calculation of a separate, nested, estimation. We investigate the statistical implications of nesting MC estimators, including cases of multiple levels of nesting, and establish the conditions under which they converge. We derive corresponding rates of convergence and provide empirical evidence that these rates are observed in practice. We further establish a number of pitfalls that can arise from naive nesting of MC estimators, provide guidelines about how these can be avoided, and lay out novel methods for reformulating certain classes of nested expectation problems into single expectations, leading to improved convergence rates. We demonstrate the applicability of our work by using our results to develop a new estimator for discrete Bayesian experimental design problems and derive error bounds for a class of variational objectives.}\n}\n\n
\n
\n\n\n
\n Many problems in machine learning and statistics involve nested expectations and thus do not permit conventional Monte Carlo (MC) estimation. For such problems, one must nest estimators, such that terms in an outer estimator themselves involve calculation of a separate, nested, estimation. We investigate the statistical implications of nesting MC estimators, including cases of multiple levels of nesting, and establish the conditions under which they converge. We derive corresponding rates of convergence and provide empirical evidence that these rates are observed in practice. We further establish a number of pitfalls that can arise from naive nesting of MC estimators, provide guidelines about how these can be avoided, and lay out novel methods for reformulating certain classes of nested expectation problems into single expectations, leading to improved convergence rates. We demonstrate the applicability of our work by using our results to develop a new estimator for discrete Bayesian experimental design problems and derive error bounds for a class of variational objectives.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Tighter variational bounds are not necessarily better.\n \n \n \n \n\n\n \n Rainforth, T.; Kosiorek, A. R; Le, T. A.; Maddison, C. J; Igl, M.; Wood, F.; and Teh, Y. W.\n\n\n \n\n\n\n In Thirty-fifth International Conference on Machine Learning (ICML), 2018. \n \n\n\n\n
\n\n\n\n \n \n \"Tighter link\n  \n \n \n \"Tighter paper\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{RAI-18b,\n  title={Tighter variational bounds are not necessarily better},\n  author={Rainforth, Tom and Kosiorek, Adam R and Le, Tuan Anh and Maddison, Chris J and Igl, Maximilian and Wood, Frank and Teh, Yee Whye},\n  booktitle={Thirty-fifth International Conference on Machine Learning (ICML)},\n  year={2018},\n  archiveprefix = {arXiv},\n  eprint = {1802.04537},\n  url_Link={https://arxiv.org/abs/1802.04537},\n  url_Paper={https://arxiv.org/pdf/1802.04537.pdf},\n  abstract={We provide theoretical and empirical evidence that using tighter evidence lower bounds (ELBOs) can be detrimental to the process of learning an inference network by reducing the signal-to-noise ratio of the gradient estimator. Our results call into question common implicit assumptions that tighter ELBOs are better variational objectives for simultaneous model learning and inference amortization schemes. Based on our insights, we introduce three new algorithms: the partially importance weighted auto-encoder (PIWAE), the multiply importance weighted auto-encoder (MIWAE), and the combination importance weighted auto-encoder (CIWAE), each of which includes the standard importance weighted auto-encoder (IWAE) as a special case. We show that each can deliver improvements over IWAE, even when performance is measured by the IWAE target itself. Furthermore, our results suggest that PIWAE may be able to deliver simultaneous improvements in the training of both the inference and generative networks.}\n}\n\n
\n
\n\n\n
\n We provide theoretical and empirical evidence that using tighter evidence lower bounds (ELBOs) can be detrimental to the process of learning an inference network by reducing the signal-to-noise ratio of the gradient estimator. Our results call into question common implicit assumptions that tighter ELBOs are better variational objectives for simultaneous model learning and inference amortization schemes. Based on our insights, we introduce three new algorithms: the partially importance weighted auto-encoder (PIWAE), the multiply importance weighted auto-encoder (MIWAE), and the combination importance weighted auto-encoder (CIWAE), each of which includes the standard importance weighted auto-encoder (IWAE) as a special case. We show that each can deliver improvements over IWAE, even when performance is measured by the IWAE target itself. Furthermore, our results suggest that PIWAE may be able to deliver simultaneous improvements in the training of both the inference and generative networks.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Deep Variational Reinforcement Learning for POMDPs.\n \n \n \n \n\n\n \n Igl, M.; Zintgraf, L.; Le, T. A.; Wood, F.; and Whiteson, S.\n\n\n \n\n\n\n In Thirty-fifth International Conference on Machine Learning (ICML), 2018. \n \n\n\n\n
\n\n\n\n \n \n \"Deep link\n  \n \n \n \"Deep paper\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{IGL-18,\n  title={Deep Variational Reinforcement Learning for POMDPs},\n  author={Igl, Maximilian and Zintgraf, Luisa and Le, Tuan Anh and Wood, Frank and Whiteson, Shimon},\n  booktitle={Thirty-fifth International Conference on Machine Learning (ICML)},\n  year={2018},\n  archiveprefix = {arXiv},\n  eprint = {1806.02426},\n  url_Link={https://arxiv.org/abs/1806.02426},\n  url_Paper={https://arxiv.org/pdf/1806.02426.pdf},\n  abstract={Many real-world sequential decision making problems are partially observable by nature, and the environment model is typically unknown. Consequently, there is great need for reinforcement learning methods that can tackle such problems given only a stream of incomplete and noisy observations. In this paper, we propose deep variational reinforcement learning (DVRL), which introduces an inductive bias that allows an agent to learn a generative model of the environment and perform inference in that model to effectively aggregate the available information. We develop an n-step approximation to the evidence lower bound (ELBO), allowing the model to be trained jointly with the policy. This ensures that the latent state representation is suitable for the control task. In experiments on Mountain Hike and flickering Atari we show that our method outperforms previous approaches relying on recurrent neural networks to encode the past.}\n}\n\n
\n
\n\n\n
\n Many real-world sequential decision making problems are partially observable by nature, and the environment model is typically unknown. Consequently, there is great need for reinforcement learning methods that can tackle such problems given only a stream of incomplete and noisy observations. In this paper, we propose deep variational reinforcement learning (DVRL), which introduces an inductive bias that allows an agent to learn a generative model of the environment and perform inference in that model to effectively aggregate the available information. We develop an n-step approximation to the evidence lower bound (ELBO), allowing the model to be trained jointly with the policy. This ensures that the latent state representation is suitable for the control task. In experiments on Mountain Hike and flickering Atari we show that our method outperforms previous approaches relying on recurrent neural networks to encode the past.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Inference trees: Adaptive inference with exploration.\n \n \n \n \n\n\n \n Rainforth, T.; Zhou, Y.; Lu, X.; Teh, Y. W.; Wood, F.; Yang, H.; and van de Meent, J.\n\n\n \n\n\n\n In 1st Symposium on Advances in Approximate Bayesian Inference, 2018. \n \n\n\n\n
\n\n\n\n \n \n \"Inference link\n  \n \n \n \"Inference paper\n  \n \n \n \"Inference presentation\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{RAI-18c,\n  title={Inference trees: Adaptive inference with exploration},\n  author={Rainforth, Tom and Zhou, Yuan and Lu, Xiaoyu and Teh, Yee Whye and Wood, Frank and Yang, Hongseok and van de Meent, Jan-Willem},\n  booktitle={1st Symposium on Advances in Approximate Bayesian Inference},\n  archiveprefix = {arXiv},\n  eprint = {1806.09550},\n  year={2018},\n  url_Link={https://arxiv.org/abs/1806.09550},\n  url_Paper={https://arxiv.org/pdf/1806.09550.pdf},\n  url_Presentation={http://www.approximateinference.org/2018/schedule/Rainforth2018.pdf},\n  abstract={We introduce inference trees (ITs), a new class of inference methods that build on ideas from Monte Carlo tree search to perform adaptive sampling in a manner that balances exploration with exploitation, ensures consistency, and alleviates pathologies in existing adaptive methods. ITs adaptively sample from hierarchical partitions of the parameter space, while simultaneously learning these partitions in an online manner. This enables ITs to not only identify regions of high posterior mass, but also maintain uncertainty estimates to track regions where significant posterior mass may have been missed. ITs can be based on any inference method that provides a consistent estimate of the marginal likelihood. They are particularly effective when combined with sequential Monte Carlo, where they capture long-range dependencies and yield improvements beyond proposal adaptation alone.}\n}\n\n
\n
\n\n\n
\n We introduce inference trees (ITs), a new class of inference methods that build on ideas from Monte Carlo tree search to perform adaptive sampling in a manner that balances exploration with exploitation, ensures consistency, and alleviates pathologies in existing adaptive methods. ITs adaptively sample from hierarchical partitions of the parameter space, while simultaneously learning these partitions in an online manner. This enables ITs to not only identify regions of high posterior mass, but also maintain uncertainty estimates to track regions where significant posterior mass may have been missed. ITs can be based on any inference method that provides a consistent estimate of the marginal likelihood. They are particularly effective when combined with sequential Monte Carlo, where they capture long-range dependencies and yield improvements beyond proposal adaptation alone.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Online learning rate adaptation with hypergradient descent.\n \n \n \n \n\n\n \n Baydin, A. G.; Cornish, R.; Rubio, D. M.; Schmidt, M.; and Wood, F.\n\n\n \n\n\n\n In Sixth International Conference on Learning Representations (ICLR 2018), 2018. \n \n\n\n\n
\n\n\n\n \n \n \"Online link\n  \n \n \n \"Online paper\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@inproceedings{BAY-18a,\n  title={Online learning rate adaptation with hypergradient descent},\n  author={Baydin, Atilim Gunes and Cornish, Robert and Rubio, David Martinez and Schmidt, Mark and Wood, Frank},\n  booktitle={Sixth International Conference on Learning Representations (ICLR 2018)},\n  archiveprefix = {arXiv},\n  eprint = {1703.04782},\n  year={2018},\n  url_Link={https://arxiv.org/abs/1703.04782},\n  url_Paper={https://arxiv.org/pdf/1703.04782.pdf},\n  url_Link={https://iclr.cc/Conferences/2018/Schedule?showEvent=14},\n  url_Paper={https://openreview.net/pdf?id=BkrsAzWAb},\n  abstract={We introduce a general method for improving the convergence rate of gradient-based optimizers that is easy to implement and works well in practice.  We demonstrate the effectiveness of the method in a range of optimization problems by applying it to stochastic gradient descent, stochastic gradient descent with Nesterov momentum, and Adam, showing that it significantly reduces the need for the manual tuning of the initial learning rate for these commonly used algorithms.  Our method works by dynamically updating the learning rate during optimization using the gradient with respect to the learning rate of the update rule itself.  Computing this "hypergradient" needs little additional computation, requires only one extra copy of the original gradient to be stored in memory, and relies upon nothing more than what is provided by reverse-mode automatic differentiation.}\n}\n\n
\n
\n\n\n
\n We introduce a general method for improving the convergence rate of gradient-based optimizers that is easy to implement and works well in practice. We demonstrate the effectiveness of the method in a range of optimization problems by applying it to stochastic gradient descent, stochastic gradient descent with Nesterov momentum, and Adam, showing that it significantly reduces the need for the manual tuning of the initial learning rate for these commonly used algorithms. Our method works by dynamically updating the learning rate during optimization using the gradient with respect to the learning rate of the update rule itself. Computing this \"hypergradient\" needs little additional computation, requires only one extra copy of the original gradient to be stored in memory, and relies upon nothing more than what is provided by reverse-mode automatic differentiation.\n
\n\n\n
\n\n\n\n\n\n
\n
\n\n
\n
\n  \n unpublished\n \n \n (2)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n \n An introduction to probabilistic programming.\n \n \n \n \n\n\n \n van de Meent, J.; Paige, B.; Yang, H.; and Wood, F.\n\n\n \n\n\n\n 2018.\n \n\n\n\n
\n\n\n\n \n \n \"An link\n  \n \n \n \"An paper\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@unpublished{MEE-18,\n  title={An introduction to probabilistic programming},\n  author={van de Meent, Jan-Willem and Paige, Brooks and Yang, Hongseok and Wood, Frank},\n  journal={arXiv preprint},\n  archiveprefix = {arXiv},\n  eprint = {1809.10756},\n  year={2018},\n  url_Link={https://arxiv.org/abs/1809.10756},\n  url_Paper={https://arxiv.org/pdf/1809.10756.pdf},\n  abstract={This document is designed to be a first-year graduate-level introduction to probabilistic programming. It not only provides a thorough background for anyone wishing to use a probabilistic programming system, but also introduces the techniques needed to design and build these systems. It is aimed at people who have an undergraduate-level understanding of either or, ideally, both probabilistic machine learning and programming languages.\\\\\n  We start with a discussion of model-based reasoning and explain why conditioning as a foundational computation is central to the fields of probabilistic machine learning and artificial intelligence. We then introduce a simple first-order probabilistic programming language (PPL) whose programs define static-computation-graph, finite-variable-cardinality models. In the context of this restricted PPL we introduce fundamental inference algorithms and describe how they can be implemented in the context of models denoted by probabilistic programs.\\\\\n  In the second part of this document, we introduce a higher-order probabilistic programming language, with a functionality analogous to that of established programming languages. This affords the opportunity to define models with dynamic computation graphs, at the cost of requiring inference methods that generate samples by repeatedly executing the program. Foundational inference algorithms for this kind of probabilistic programming language are explained in the context of an interface between program executions and an inference controller.\\\\\n  This document closes with a chapter on advanced topics which we believe to be, at the time of writing, interesting directions for probabilistic programming research; directions that point towards a tight integration with deep neural network research and the development of systems for next-generation artificial intelligence applications.}\n}\n\n
\n
\n\n\n
\n This document is designed to be a first-year graduate-level introduction to probabilistic programming. It not only provides a thorough background for anyone wishing to use a probabilistic programming system, but also introduces the techniques needed to design and build these systems. It is aimed at people who have an undergraduate-level understanding of either or, ideally, both probabilistic machine learning and programming languages.\\\\ We start with a discussion of model-based reasoning and explain why conditioning as a foundational computation is central to the fields of probabilistic machine learning and artificial intelligence. We then introduce a simple first-order probabilistic programming language (PPL) whose programs define static-computation-graph, finite-variable-cardinality models. In the context of this restricted PPL we introduce fundamental inference algorithms and describe how they can be implemented in the context of models denoted by probabilistic programs.\\\\ In the second part of this document, we introduce a higher-order probabilistic programming language, with a functionality analogous to that of established programming languages. This affords the opportunity to define models with dynamic computation graphs, at the cost of requiring inference methods that generate samples by repeatedly executing the program. Foundational inference algorithms for this kind of probabilistic programming language are explained in the context of an interface between program executions and an inference controller.\\\\ This document closes with a chapter on advanced topics which we believe to be, at the time of writing, interesting directions for probabilistic programming research; directions that point towards a tight integration with deep neural network research and the development of systems for next-generation artificial intelligence applications.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n Towards a Testable Notion of Generalization for Generative Adversarial Networks.\n \n \n \n\n\n \n Cornish, R.; Yang, H.; and Wood, F.\n\n\n \n\n\n\n 2018.\n \n\n\n\n
\n\n\n\n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n\n\n\n
\n
@unpublished{COR-18,\n  title={Towards a Testable Notion of Generalization for Generative Adversarial Networks},\n  author={Cornish, Robert and Yang, Hongseok and Wood, Frank},\n  year={2018},\n  }\n\n
\n
\n\n\n\n
\n\n\n\n\n\n
\n
\n\n\n\n\n
\n
\n\n\n\n\n
\n\n\n \n\n \n \n \n \n\n
\n"}; document.write(bibbase_data.data);