var bibbase_data = {"data":"\"Loading..\"\n\n
\n\n \n\n \n\n \n \n\n \n\n \n \n\n \n\n \n
\n generated by\n \n \"bibbase.org\"\n\n \n
\n \n\n
\n\n \n\n\n
\n\n Excellent! Next you can\n create a new website with this list, or\n embed it in an existing web page by copying & pasting\n any of the following snippets.\n\n
\n JavaScript\n (easiest)\n
\n \n <script src=\"https://bibbase.org/show?bib=kevinsbello.github.io/bello.bib&css=kevinsbello.github.io/assets/css/bibBaseAll.css&jsonp=1&jsonp=1\"></script>\n \n
\n\n PHP\n
\n \n <?php\n $contents = file_get_contents(\"https://bibbase.org/show?bib=kevinsbello.github.io/bello.bib&css=kevinsbello.github.io/assets/css/bibBaseAll.css&jsonp=1\");\n print_r($contents);\n ?>\n \n
\n\n iFrame\n (not recommended)\n
\n \n <iframe src=\"https://bibbase.org/show?bib=kevinsbello.github.io/bello.bib&css=kevinsbello.github.io/assets/css/bibBaseAll.css&jsonp=1\"></iframe>\n \n
\n\n

\n For more details see the documention.\n

\n
\n
\n\n
\n\n This is a preview! To use this list on your own web site\n or create a new web site from it,\n create a free account. The file will be added\n and you will be able to edit it in the File Manager.\n We will show you instructions once you've created your account.\n
\n\n
\n\n

To the site owner:

\n\n

Action required! Mendeley is changing its\n API. In order to keep using Mendeley with BibBase past April\n 14th, you need to:\n

    \n
  1. renew the authorization for BibBase on Mendeley, and
  2. \n
  3. update the BibBase URL\n in your page the same way you did when you initially set up\n this page.\n
  4. \n
\n

\n\n

\n \n \n Fix it now\n

\n
\n\n
\n\n\n
\n \n \n
\n
\n  \n 2024\n \n \n (2)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n Identifying General Mechanism Shifts in Linear Causal Representations.\n \n \n \n\n\n \n Chen, T.; Bello, K.; Locatello, F.; Aragam, B.; and Ravikumar, P.\n\n\n \n\n\n\n Under Review. 2024.\n \n\n\n\n
\n\n\n\n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n \n \n \n \n\n\n\n
\n
@article{chen2024,\n  title={Identifying General Mechanism Shifts in Linear Causal Representations},\n  author={Tianyu Chen and Kevin Bello and Francesco Locatello and Bryon Aragam and Pradeep Ravikumar},\n  journal={Under Review},\n  year={2024},\n  abstract={Recent theoretical advances in unsupervised learning of causal representations guarantee the identifiability of latent causal graphs and mixing functions, given access to interventional data. However, current strong identifiability results hinge on hard single-node interventions, necessitating each latent variable's intervention in at least one context/environment, limiting practical applicability. This paper addresses scenarios with multiple datasets with the aim of uncovering the latent sources driving observed distribution changes, if any. In the setting of latent linear SEMs and linear mixing functions, we establish the identifiability of mechanism shifts in latent causal representations and provide an efficient algorithm based on independent component analysis to identify them. Crucially, our approach eliminates the need to estimate underlying causal graphs or shared mixing functions, significantly reducing both identifiability ambiguities and computational demands. Importantly, our method does not assume a consistent topological order of the latent causal graphs across different environments, allowing for varied types of interventions, including edge additions and reversals. Synthetic data experiments and a real-world application in psychometrics validate our contributions.},\n  keyword={Causality and Graphical Models,Distribution Shifts,Latent Variable Models and Generative Models,Interpretability and Fairness},\n}\n\n
\n
\n\n\n
\n Recent theoretical advances in unsupervised learning of causal representations guarantee the identifiability of latent causal graphs and mixing functions, given access to interventional data. However, current strong identifiability results hinge on hard single-node interventions, necessitating each latent variable's intervention in at least one context/environment, limiting practical applicability. This paper addresses scenarios with multiple datasets with the aim of uncovering the latent sources driving observed distribution changes, if any. In the setting of latent linear SEMs and linear mixing functions, we establish the identifiability of mechanism shifts in latent causal representations and provide an efficient algorithm based on independent component analysis to identify them. Crucially, our approach eliminates the need to estimate underlying causal graphs or shared mixing functions, significantly reducing both identifiability ambiguities and computational demands. Importantly, our method does not assume a consistent topological order of the latent causal graphs across different environments, allowing for varied types of interventions, including edge additions and reversals. Synthetic data experiments and a real-world application in psychometrics validate our contributions.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n Identifying Causal Changes Between Linear Structural Equation Models.\n \n \n \n\n\n \n Malik, V.; Bello, K.; Ghoshal, A.; and Honorio, J.\n\n\n \n\n\n\n Under Review. 2024.\n \n\n\n\n
\n\n\n\n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n \n \n \n \n\n\n\n
\n
@article{malik2024,\n  title={Identifying Causal Changes Between Linear Structural Equation Models},\n  author={Vineet Malik and Kevin Bello and Asish Ghoshal and Jean Honorio},\n  journal={Under Review},\n  year={2024},\n  abstract={In numerous scientific domains, it is crucial to learn the structures---commonly assumed to be directed acyclic graphs (DAGs)---of structural equation models (SEMs) from data as they are effective models for representing causal relationships among variables in a system. Nevertheless, in several scenarios, it is preferable to directly estimate the changes in causal relations between two distinct conditions. For example, estimating causal changes in genetic expression between healthy and diseased subjects can assist in isolating the genetic factors that contribute to the disease. This work studies the problem of directly estimating the structural difference between two linear SEMs, i.e. without estimating the individual DAG structures, given two sets of samples drawn from the individual SEMs. We consider general classes of linear SEMs where the noise distributions are allowed to be Gaussian or non-Gaussian and have different noise variances across the variables in the individual SEMs. We rigorously characterize novel conditions related to the topological layering of the structural difference that lead to the identifiability of the difference DAG (DDAG). Moreover, we propose an efficient algorithm to identify the DDAG via sequential re-estimation of the difference of precision matrices. A surprising implication of our results is that causal changes can be identifiable even between non-identifiable models such as Gaussian SEMs with unequal noise variances. Synthetic experiments are presented to validate our theoretical results and to show the scalability of our method.},\n  keyword={Causality and Graphical Models,Distribution Shifts,Latent Variable Models and Generative Models,Interpretability and Fairness},\n}\n\n
\n
\n\n\n
\n In numerous scientific domains, it is crucial to learn the structures—commonly assumed to be directed acyclic graphs (DAGs)—of structural equation models (SEMs) from data as they are effective models for representing causal relationships among variables in a system. Nevertheless, in several scenarios, it is preferable to directly estimate the changes in causal relations between two distinct conditions. For example, estimating causal changes in genetic expression between healthy and diseased subjects can assist in isolating the genetic factors that contribute to the disease. This work studies the problem of directly estimating the structural difference between two linear SEMs, i.e. without estimating the individual DAG structures, given two sets of samples drawn from the individual SEMs. We consider general classes of linear SEMs where the noise distributions are allowed to be Gaussian or non-Gaussian and have different noise variances across the variables in the individual SEMs. We rigorously characterize novel conditions related to the topological layering of the structural difference that lead to the identifiability of the difference DAG (DDAG). Moreover, we propose an efficient algorithm to identify the DDAG via sequential re-estimation of the difference of precision matrices. A surprising implication of our results is that causal changes can be identifiable even between non-identifiable models such as Gaussian SEMs with unequal noise variances. Synthetic experiments are presented to validate our theoretical results and to show the scalability of our method.\n
\n\n\n
\n\n\n\n\n\n
\n
\n\n
\n
\n  \n 2023\n \n \n (4)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n \n iSCAN: Identifying Causal Mechanism Shifts among Nonlinear Additive Noise Models.\n \n \n \n \n\n\n \n Chen, T.; Bello, K.; Aragam, B.; and Ravikumar, P.\n\n\n \n\n\n\n NeurIPS-23. Advances in Neural Information Processing Systems. 2023.\n \n\n\n\n
\n\n\n\n \n \n \"iSCAN: proceedings\n  \n \n \n \"iSCAN: preprint\n  \n \n \n \"iSCAN: code\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 20 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n
\n
@article{chen2023iscan,\n  title={iSCAN: Identifying Causal Mechanism Shifts among Nonlinear Additive Noise Models},\n  author={Tianyu Chen and Kevin Bello and Bryon Aragam and Pradeep Ravikumar},\n  journal={<span style="color: #0088cc; font-style: normal">NeurIPS-23.</span> Advances in Neural Information Processing Systems},\n  year={2023},\n  abstract={Structural causal models (SCMs) are widely used in various disciplines to represent causal relationships among variables in complex systems. Unfortunately, the underlying causal structure is often unknown, and estimating it from data remains a challenging task. In many situations, however, the end goal is to localize the changes (shifts) in the causal mechanisms between related datasets instead of learning the full causal structure of the individual datasets. Some applications include root cause analysis, analyzing gene regulatory network structure changes between healthy and cancerous individuals, or explaining distribution shifts. This paper focuses on identifying the causal mechanism shifts in two or more related datasets over the same set of variables—without estimating the entire DAG structure of each SCM. Prior work under this setting assumed linear models with Gaussian noises; instead, in this work we assume that each SCM belongs to the more general class of nonlinear additive noise models (ANMs). A key technical contribution of this work is to show that the Jacobian of the score function for the mixture distribution allows for the identification of shifts under general non-parametric functional mechanisms. Once the shifted variables are identified, we leverage recent work to estimate the structural differences, if any, for the shifted variables. Experiments on synthetic and real-world data are provided to showcase the applicability of this approach. Code implementing the proposed method is open-source and publicly available at https://github.com/kevinsbello/iSCAN.},\n  url_Proceedings={https://openreview.net/pdf?id=GEtXhqKW6X},\n  url_Preprint={https://arxiv.org/pdf/2306.17361.pdf},\n  url_Code={https://github.com/kevinsbello/iscan/},\n  keyword={Causality and Graphical Models,Distribution Shifts,Latent Variable Models and Generative Models,Statistical Machine Learning,Interpretability and Fairness},\n}\n\n
\n
\n\n\n
\n Structural causal models (SCMs) are widely used in various disciplines to represent causal relationships among variables in complex systems. Unfortunately, the underlying causal structure is often unknown, and estimating it from data remains a challenging task. In many situations, however, the end goal is to localize the changes (shifts) in the causal mechanisms between related datasets instead of learning the full causal structure of the individual datasets. Some applications include root cause analysis, analyzing gene regulatory network structure changes between healthy and cancerous individuals, or explaining distribution shifts. This paper focuses on identifying the causal mechanism shifts in two or more related datasets over the same set of variables—without estimating the entire DAG structure of each SCM. Prior work under this setting assumed linear models with Gaussian noises; instead, in this work we assume that each SCM belongs to the more general class of nonlinear additive noise models (ANMs). A key technical contribution of this work is to show that the Jacobian of the score function for the mixture distribution allows for the identification of shifts under general non-parametric functional mechanisms. Once the shifted variables are identified, we leverage recent work to estimate the structural differences, if any, for the shifted variables. Experiments on synthetic and real-world data are provided to showcase the applicability of this approach. Code implementing the proposed method is open-source and publicly available at https://github.com/kevinsbello/iSCAN.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Global Optimality in Bivariate Gradient-based DAG Learning.\n \n \n \n \n\n\n \n Deng, C.; Bello, K.; Aragam, B.; and Ravikumar, P.\n\n\n \n\n\n\n NeurIPS-23. Advances in Neural Information Processing Systems. 2023.\n \n\n\n\n
\n\n\n\n \n \n \"Global proceedings\n  \n \n \n \"Global preprint\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 7 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n\n\n\n
\n
@article{deng2023global,\n  title={Global Optimality in Bivariate Gradient-based DAG Learning},\n  author={Deng, Chang and Bello, Kevin and Aragam, Bryon and Ravikumar, Pradeep},\n  journal={<span style="color: #0088cc; font-style: normal">NeurIPS-23.</span> Advances in Neural Information Processing Systems},\n  year={2023},\n  abstract={Recently, a new class of non-convex optimization problems motivated by the statistical problem of learning an acyclic directed graphical model from data has attracted significant interest. While existing work uses standard first-order optimization schemes to solve this problem, proving the global optimality of such approaches has proven elusive. The difficulty lies in the fact that unlike other non-convex problems in the literature, this problem is not "benign", and possesses multiple spurious solutions that standard approaches can easily get trapped in. In this paper, we prove that a simple path-following optimization scheme globally converges to the global minimum of the population loss in the bivariate setting.},\n  url_Proceedings={https://openreview.net/pdf?id=5MG5C5aS6m},\n  url_Preprint={https://arxiv.org/pdf/2306.17378.pdf},\n  keyword={Causality and Graphical Models,Optimization},\n}\n\n
\n
\n\n\n
\n Recently, a new class of non-convex optimization problems motivated by the statistical problem of learning an acyclic directed graphical model from data has attracted significant interest. While existing work uses standard first-order optimization schemes to solve this problem, proving the global optimality of such approaches has proven elusive. The difficulty lies in the fact that unlike other non-convex problems in the literature, this problem is not \"benign\", and possesses multiple spurious solutions that standard approaches can easily get trapped in. In this paper, we prove that a simple path-following optimization scheme globally converges to the global minimum of the population loss in the bivariate setting.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Optimizing NOTEARS objectives via topological swaps.\n \n \n \n \n\n\n \n Deng, C.; Bello, K.; Aragam, B.; and Ravikumar, P. K.\n\n\n \n\n\n\n ICML-23. Proceedings of the 40th International Conference on Machine Learning, 202: 7563–7595. 2023.\n \n\n\n\n
\n\n\n\n \n \n \"Optimizing proceedings\n  \n \n \n \"Optimizing preprint\n  \n \n \n \"Optimizing code\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 8 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n\n\n\n
\n
@article{deng2023optimizing,\n  title={Optimizing NOTEARS objectives via topological swaps},\n  author={Deng, Chang and Bello, Kevin and Aragam, Bryon and Ravikumar, Pradeep Kumar},\n  journal={<span style="color: #0088cc; font-style: normal">ICML-23.</span> Proceedings of the 40th International Conference on Machine Learning},\n  pages = {7563--7595},\n  volume = {202},\n  year={2023},\n  abstract = {Recently, an intriguing class of non-convex optimization problems has emerged in the context of learning directed acyclic graphs (DAGs). These problems involve minimizing a given loss or score function, subject to a non-convex continuous constraint that penalizes the presence of cycles in a graph. In this work, we delve into the optimality challenges associated with this class of non-convex programs. To address these challenges, we propose a bi-level algorithm that leverages the non-convex constraint in a novel way. The outer level of the algorithm optimizes over topological orders by iteratively swapping pairs of nodes within the topological order of a DAG. A key innovation of our approach is the development of an effective method for generating a set of candidate swapping pairs for each iteration. At the inner level, given a topological order, we utilize off-the-shelf solvers that can handle linear constraints. The key advantage of our proposed algorithm is that it is guaranteed to find a local minimum or a KKT point under weaker conditions compared to previous work and finds solutions with lower scores. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches in terms of achieving a better score. Additionally, our method can also be used as a post-processing algorithm to significantly improve the score of other algorithms. Code implementing the proposed method is available at https://github.com/duntrain/topo.},\n  url_Proceedings={https://proceedings.mlr.press/v202/deng23a.html},\n  url_Preprint={https://arxiv.org/pdf/2305.17277.pdf},\n  url_Code={https://github.com/duntrain/topo/},\n  keyword={Causality and Graphical Models,Optimization},\n}\n\n
\n
\n\n\n
\n Recently, an intriguing class of non-convex optimization problems has emerged in the context of learning directed acyclic graphs (DAGs). These problems involve minimizing a given loss or score function, subject to a non-convex continuous constraint that penalizes the presence of cycles in a graph. In this work, we delve into the optimality challenges associated with this class of non-convex programs. To address these challenges, we propose a bi-level algorithm that leverages the non-convex constraint in a novel way. The outer level of the algorithm optimizes over topological orders by iteratively swapping pairs of nodes within the topological order of a DAG. A key innovation of our approach is the development of an effective method for generating a set of candidate swapping pairs for each iteration. At the inner level, given a topological order, we utilize off-the-shelf solvers that can handle linear constraints. The key advantage of our proposed algorithm is that it is guaranteed to find a local minimum or a KKT point under weaker conditions compared to previous work and finds solutions with lower scores. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches in terms of achieving a better score. Additionally, our method can also be used as a post-processing algorithm to significantly improve the score of other algorithms. Code implementing the proposed method is available at https://github.com/duntrain/topo.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Bayesian Dynamic DAG Learning: Application in Discovering Dynamic Effective Connectome of Brain.\n \n \n \n \n\n\n \n Bagheri, A.; Pasande, M.; Bello, K.; Akhondi-Asl, A.; and Araabi, B. N.\n\n\n \n\n\n\n ArXiv:2309.07080, 2023.\n \n\n\n\n
\n\n\n\n \n \n \"Bayesian preprint\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 12 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n\n\n\n
\n
@misc{bagheri2023bayesian,\n  title={Bayesian Dynamic DAG Learning: Application in Discovering Dynamic Effective Connectome of Brain},\n  author={Bagheri, Abdolmahdi and Pasande, Mohammad and Bello, Kevin and Akhondi-Asl, Alireza and Araabi, Babak Nadjar},\n  howpublished = {ArXiv:2309.07080},\n  abstract={Understanding the complex mechanisms of the brain can be unraveled by extracting the Dynamic Effective Connectome (DEC). Recently, score-based Directed Acyclic Graph (DAG) discovery methods have shown significant improvements in extracting the causal structure and inferring effective connectivity. However, learning DEC through these methods still faces two main challenges: one with the fundamental impotence of high-dimensional dynamic DAG discovery methods and the other with the low quality of fMRI data. In this paper, we introduce Bayesian Dynamic DAG learning with M-matrices Acyclicity characterization (BDyMA) method to address the challenges in discovering DEC. The presented dynamic causal model enables us to discover bidirected edges as well. Leveraging an unconstrained framework in the BDyMA method leads to more accurate results in detecting high-dimensional networks, achieving sparser outcomes, making it particularly suitable for extracting DEC. Additionally, the score function of the BDyMA method allows the incorporation of prior knowledge into the process of dynamic causal discovery which further enhances the accuracy of results. Comprehensive simulations on synthetic data and experiments on Human Connectome Project (HCP) data demonstrate that our method can handle both of the two main challenges, yielding more accurate and reliable DEC compared to state-of-the-art and baseline methods. Additionally, we investigate the trustworthiness of DTI data as prior knowledge for DEC discovery and show the improvements in DEC discovery when the DTI data is incorporated into the process.},\n  url_Preprint={https://arxiv.org/abs/2309.07080},\n  year={2023},\n  keyword={Causality and Graphical Models,Applications},\n}\n\n
\n
\n\n\n
\n Understanding the complex mechanisms of the brain can be unraveled by extracting the Dynamic Effective Connectome (DEC). Recently, score-based Directed Acyclic Graph (DAG) discovery methods have shown significant improvements in extracting the causal structure and inferring effective connectivity. However, learning DEC through these methods still faces two main challenges: one with the fundamental impotence of high-dimensional dynamic DAG discovery methods and the other with the low quality of fMRI data. In this paper, we introduce Bayesian Dynamic DAG learning with M-matrices Acyclicity characterization (BDyMA) method to address the challenges in discovering DEC. The presented dynamic causal model enables us to discover bidirected edges as well. Leveraging an unconstrained framework in the BDyMA method leads to more accurate results in detecting high-dimensional networks, achieving sparser outcomes, making it particularly suitable for extracting DEC. Additionally, the score function of the BDyMA method allows the incorporation of prior knowledge into the process of dynamic causal discovery which further enhances the accuracy of results. Comprehensive simulations on synthetic data and experiments on Human Connectome Project (HCP) data demonstrate that our method can handle both of the two main challenges, yielding more accurate and reliable DEC compared to state-of-the-art and baseline methods. Additionally, we investigate the trustworthiness of DTI data as prior knowledge for DEC discovery and show the improvements in DEC discovery when the DTI data is incorporated into the process.\n
\n\n\n
\n\n\n\n\n\n
\n
\n\n
\n
\n  \n 2022\n \n \n (3)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n \n DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization.\n \n \n \n \n\n\n \n Bello, K.; Aragam, B.; and Ravikumar, P.\n\n\n \n\n\n\n NeurIPS-22. Advances in Neural Information Processing Systems, 35: 8226–8239. 2022.\n \n\n\n\n
\n\n\n\n \n \n \"DAGMA: proceedings\n  \n \n \n \"DAGMA: preprint\n  \n \n \n \"DAGMA: code\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 13 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n\n\n\n
\n
@article{bello2022dagma,\n  title={DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization},\n  author={Bello, Kevin and Aragam, Bryon and Ravikumar, Pradeep},\n  journal={<span style="color: #0088cc; font-style: normal">NeurIPS-22.</span> Advances in Neural Information Processing Systems},\n  volume={35},\n  pages={8226--8239},\n  year={2022},\n  abstract={The combinatorial problem of learning directed acyclic graphs (DAGs) from data was recently framed as a purely continuous optimization problem by leveraging a differentiable acyclicity characterization of DAGs based on the trace of a matrix exponential function. Existing acyclicity characterizations are based on the idea that powers of an adjacency matrix contain information about walks and cycles. In this work, we propose a new acyclicity characterization based on the log-determinant (log-det) function, which leverages the nilpotency property of DAGs. To deal with the inherent asymmetries of a DAG, we relate the domain of our log-det characterization to the set of M-matrices, which is a key difference to the classical log-det function defined over the cone of positive definite matrices. Similar to acyclicity functions previously proposed, our characterization is also exact and differentiable. However, when compared to existing characterizations, our log-det function: (1) Is better at detecting large cycles; (2) Has better-behaved gradients; and (3) Its runtime is in practice about an order of magnitude faster. From the optimization side, we drop the typically used augmented Lagrangian scheme and propose DAGMA (Directed Acyclic Graphs via M-matrices for Acyclicity), a method that resembles the central path for barrier methods. Each point in the central path of DAGMA is a solution to an unconstrained problem regularized by our log-det function, then we show that at the limit of the central path the solution is guaranteed to be a DAG. Finally, we provide extensive experiments for linear and nonlinear SEMs and show that our approach can reach large speedups and smaller structural Hamming distances against state-of-the-art methods. Code implementing the proposed method is open-source and publicly available at https://github.com/kevinsbello/dagma.},\n  url_Proceedings={https://proceedings.neurips.cc/paper_files/paper/2022/hash/36e2967f87c3362e37cf988781a887ad-Abstract-Conference.html},\n  url_Preprint={https://arxiv.org/pdf/2209.08037.pdf},\n  url_Code={https://github.com/kevinsbello/dagma/},\n  keyword={Causality and Graphical Models,Optimization},\n}\n\n
\n
\n\n\n
\n The combinatorial problem of learning directed acyclic graphs (DAGs) from data was recently framed as a purely continuous optimization problem by leveraging a differentiable acyclicity characterization of DAGs based on the trace of a matrix exponential function. Existing acyclicity characterizations are based on the idea that powers of an adjacency matrix contain information about walks and cycles. In this work, we propose a new acyclicity characterization based on the log-determinant (log-det) function, which leverages the nilpotency property of DAGs. To deal with the inherent asymmetries of a DAG, we relate the domain of our log-det characterization to the set of M-matrices, which is a key difference to the classical log-det function defined over the cone of positive definite matrices. Similar to acyclicity functions previously proposed, our characterization is also exact and differentiable. However, when compared to existing characterizations, our log-det function: (1) Is better at detecting large cycles; (2) Has better-behaved gradients; and (3) Its runtime is in practice about an order of magnitude faster. From the optimization side, we drop the typically used augmented Lagrangian scheme and propose DAGMA (Directed Acyclic Graphs via M-matrices for Acyclicity), a method that resembles the central path for barrier methods. Each point in the central path of DAGMA is a solution to an unconstrained problem regularized by our log-det function, then we show that at the limit of the central path the solution is guaranteed to be a DAG. Finally, we provide extensive experiments for linear and nonlinear SEMs and show that our approach can reach large speedups and smaller structural Hamming distances against state-of-the-art methods. Code implementing the proposed method is open-source and publicly available at https://github.com/kevinsbello/dagma.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n A View of Exact Inference in Graphs from the Degree-4 Sum-of-Squares Hierarchy.\n \n \n \n \n\n\n \n Bello, K.; Ke, C.; and Honorio, J.\n\n\n \n\n\n\n AISTATS-22. Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, 151: 640–654. 2022.\n \n\n\n\n
\n\n\n\n \n \n \"A proceedings\n  \n \n \n \"A preprint\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 4 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n \n \n\n\n\n
\n
@article{bello2022view,\n  title={A View of Exact Inference in Graphs from the Degree-4 Sum-of-Squares Hierarchy},\n  author={Bello, Kevin and Ke, Chuyang and Honorio, Jean},\n  journal={<span style="color: #0088cc; font-style: normal">AISTATS-22.</span> Proceedings of The 25th International Conference on Artificial Intelligence and Statistics},\n  abstract={Performing inference in graphs is a common task within several machine learning problems, e.g., image segmentation, community detection, among others. For a given undirected connected graph, we tackle the statistical problem of exactly recovering an unknown ground-truth binary labeling of the nodes from a single corrupted observation of each edge. Such problem can be formulated as a quadratic combinatorial optimization problem over the boolean hypercube, where it has been shown before that one can (with high probability and in polynomial time) exactly recover the ground-truth labeling of graphs that have an isoperimetric number that grows with respect to the number of nodes (e.g., complete graphs, regular expanders). In this work, we apply a powerful hierarchy of relaxations, known as the sum-of-squares (SoS) hierarchy, to the combinatorial problem. Motivated by empirical evidence on the improvement in exact recoverability, we center our attention on the degree-4 SoS relaxation and set out to understand the origin of such improvement from a graph theoretical perspective. We show that the solution of the dual of the relaxed problem is related to finding edge weights of the Johnson and Kneser graphs, where the weights fulfill the SoS constraints and intuitively allow the input graph to increase its algebraic connectivity. Finally, as byproduct of our analysis, we derive a novel Cheeger-type lower bound for the algebraic connectivity of graphs with signed edge weights.},\n  volume={151},\n  pages={640--654},\n  year={2022},\n  publisher={PMLR},\n  url_proceedings={https://proceedings.mlr.press/v151/bello22a.html},\n  url_preprint={https://arxiv.org/pdf/2102.08019.pdf},\n  keyword={Statistical Machine Learning,Optimization,Causality and Graphical Models},\n}\n\n
\n
\n\n\n
\n Performing inference in graphs is a common task within several machine learning problems, e.g., image segmentation, community detection, among others. For a given undirected connected graph, we tackle the statistical problem of exactly recovering an unknown ground-truth binary labeling of the nodes from a single corrupted observation of each edge. Such problem can be formulated as a quadratic combinatorial optimization problem over the boolean hypercube, where it has been shown before that one can (with high probability and in polynomial time) exactly recover the ground-truth labeling of graphs that have an isoperimetric number that grows with respect to the number of nodes (e.g., complete graphs, regular expanders). In this work, we apply a powerful hierarchy of relaxations, known as the sum-of-squares (SoS) hierarchy, to the combinatorial problem. Motivated by empirical evidence on the improvement in exact recoverability, we center our attention on the degree-4 SoS relaxation and set out to understand the origin of such improvement from a graph theoretical perspective. We show that the solution of the dual of the relaxed problem is related to finding edge weights of the Johnson and Kneser graphs, where the weights fulfill the SoS constraints and intuitively allow the input graph to increase its algebraic connectivity. Finally, as byproduct of our analysis, we derive a novel Cheeger-type lower bound for the algebraic connectivity of graphs with signed edge weights.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n On the Fundamental Limits of Exact Inference in Structured Prediction.\n \n \n \n \n\n\n \n Lee, H.; Bello, K.; and Honorio, J.\n\n\n \n\n\n\n ISIT-22. IEEE International Symposium on Information Theory, pp: 3174–3179. 2022.\n \n\n\n\n
\n\n\n\n \n \n \"On proceedings\n  \n \n \n \"On preprint\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 9 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n\n\n\n
\n
@article{lee2022fundamental,\n  title={On the Fundamental Limits of Exact Inference in Structured Prediction},\n  author={Lee, Hanbyul and Bello, Kevin and Honorio, Jean},\n  journal={<span style="color: #0088cc; font-style: normal">ISIT-22.</span> IEEE International Symposium on Information Theory},\n  abstract={Inference is a main task in structured prediction and it is naturally modeled with a graph. In the context of Markov random fields, noisy observations corresponding to nodes and edges are usually involved, and the goal of exact inference is to recover the unknown true label for each node precisely. The focus of this paper is on the fundamental limits of exact recovery irrespective of computational efficiency, assuming the generative process proposed by Globerson et al. (2015). We derive the necessary condition for any algorithm and the sufficient condition for maximum likelihood estimation to achieve exact recovery with high probability, and reveal that the sufficient and necessary conditions are tight up to a logarithmic factor for a wide range of graphs. Finally, we show that there exists a gap between the fundamental limits and the performance of the computationally tractable method of Bello and Honorio (2019), which implies the need for further development of algorithms for exact inference.},\n  pages={3174--3179},\n  volume={pp},\n  year={2022},\n  publisher={IEEE},\n  url_proceedings={https://ieeexplore.ieee.org/document/9834614},\n  url_preprint={https://arxiv.org/pdf/2102.08895.pdf},\n  keyword={Statistical Machine Learning,Causality and Graphical Models},\n}\n\n
\n
\n\n\n
\n Inference is a main task in structured prediction and it is naturally modeled with a graph. In the context of Markov random fields, noisy observations corresponding to nodes and edges are usually involved, and the goal of exact inference is to recover the unknown true label for each node precisely. The focus of this paper is on the fundamental limits of exact recovery irrespective of computational efficiency, assuming the generative process proposed by Globerson et al. (2015). We derive the necessary condition for any algorithm and the sufficient condition for maximum likelihood estimation to achieve exact recovery with high probability, and reveal that the sufficient and necessary conditions are tight up to a logarithmic factor for a wide range of graphs. Finally, we show that there exists a gap between the fundamental limits and the performance of the computationally tractable method of Bello and Honorio (2019), which implies the need for further development of algorithms for exact inference.\n
\n\n\n
\n\n\n\n\n\n
\n
\n\n
\n
\n  \n 2021\n \n \n (4)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n \n Inverse Reinforcement Learning in a Continuous State Space with Formal Guarantees.\n \n \n \n \n\n\n \n Dexter, G.; Bello, K.; and Honorio, J.\n\n\n \n\n\n\n NeurIPS-21. Advances in Neural Information Processing Systems, 34: 6972–6982. 2021.\n \n\n\n\n
\n\n\n\n \n \n \"Inverse proceedings\n  \n \n \n \"Inverse preprint\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 2 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n\n\n\n
\n
@article{dexter2021inverse,\n  title={Inverse Reinforcement Learning in a Continuous State Space with Formal Guarantees},\n  author={Dexter, Gregory and Bello, Kevin and Honorio, Jean},\n  journal={<span style="color: #0088cc; font-style: normal">NeurIPS-21.</span> Advances in Neural Information Processing Systems},\n  abstract={Inverse Reinforcement Learning (IRL) is the problem of finding a reward function which describes observed/known expert behavior. The IRL setting is remarkably useful for automated control, in situations where the reward function is difficult to specify manually or as a means to extract agent preference. In this work, we provide a new IRL algorithm for the continuous state space setting with unknown transition dynamics by modeling the system using a basis of orthonormal functions. Moreover, we provide a proof of correctness and formal guarantees on the sample and time complexity of our algorithm. Finally, we present synthetic experiments to corroborate our theoretical guarantees.},\n  volume={34},\n  pages={6972--6982},\n  year={2021},\n  url_proceedings={https://proceedings.neurips.cc/paper/2021/hash/384babc3e7faa44cf1ca671b74499c3b-Abstract.html},\n  url_preprint={https://arxiv.org/pdf/2102.07937.pdf},\n  keyword={Statistical Machine Learning,Reinforcement Learning},\n}\n\n
\n
\n\n\n
\n Inverse Reinforcement Learning (IRL) is the problem of finding a reward function which describes observed/known expert behavior. The IRL setting is remarkably useful for automated control, in situations where the reward function is difficult to specify manually or as a means to extract agent preference. In this work, we provide a new IRL algorithm for the continuous state space setting with unknown transition dynamics by modeling the system using a basis of orthonormal functions. Moreover, we provide a proof of correctness and formal guarantees on the sample and time complexity of our algorithm. Finally, we present synthetic experiments to corroborate our theoretical guarantees.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n A Le Cam Type Bound for Adversarial Learning and Applications.\n \n \n \n \n\n\n \n Xu, Q. *; Bello, K. *; and Honorio, J.\n\n\n \n\n\n\n ISIT-21. IEEE International Symposium on Information Theory, pp: 1164–1169. 2021.\n \n\n(* Equal contribution)\n\n
\n\n\n\n \n \n \"A proceedings\n  \n \n \n \"A preprint\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 5 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n\n\n\n
\n
@article{xu2021cam,\n  title={A Le Cam Type Bound for Adversarial Learning and Applications},\n  author={Xu, Qiuling * and Bello, Kevin * and Honorio, Jean},\n  journal={<span style="color: #0088cc; font-style: normal">ISIT-21.</span> IEEE International Symposium on Information Theory},\n  abstract={Robustness of machine learning methods is essential for modern practical applications. Given the arms race between attack and defense methods, one may be curious regarding the fundamental limits of any defense mechanism. In this work, we focus on the problem of learning from noise-injected data, where the existing literature falls short by either assuming a specific attack method or by over-specifying the learning problem. We shed light on the information-theoretic limits of adversarial learning without assuming a particular learning process or attacker. Finally, we apply our general bounds to a canonical set of non-trivial learning problems and provide examples of common types of attacks.},\n  volume={pp},\n  pages={1164--1169},\n  year={2021},\n  publisher={IEEE},\n  bibbase_note={(* Equal contribution)},\n  url_proceedings={https://ieeexplore.ieee.org/document/9518178},\n  url_preprint={https://arxiv.org/pdf/2007.00289.pdf},\n  keyword={Statistical Machine Learning,Applications},\n}\n\n
\n
\n\n\n
\n Robustness of machine learning methods is essential for modern practical applications. Given the arms race between attack and defense methods, one may be curious regarding the fundamental limits of any defense mechanism. In this work, we focus on the problem of learning from noise-injected data, where the existing literature falls short by either assuming a specific attack method or by over-specifying the learning problem. We shed light on the information-theoretic limits of adversarial learning without assuming a particular learning process or attacker. Finally, we apply our general bounds to a canonical set of non-trivial learning problems and provide examples of common types of attacks.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Structured Prediction: Statistical and Computational Guarantees in Learning and Inference.\n \n \n \n \n\n\n \n Bello, K.\n\n\n \n\n\n\n Ph.D. Thesis, Purdue University, 2021.\n \n\n\n\n
\n\n\n\n \n \n \"Structured thesis\n  \n \n\n \n \n doi\n  \n \n\n \n link\n  \n \n\n bibtex\n \n\n \n\n \n  \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n\n\n\n
\n
@phdthesis{bello2021structured,\n  title={Structured Prediction: Statistical and Computational Guarantees in Learning and Inference},\n  author={Bello, Kevin},\n  year={2021},\n  school={Purdue University},\n  doi={https://doi.org/10.25394/PGS.15066291.v1},\n  url_Thesis={https://hammer.purdue.edu/ndownloader/files/28970307},\n  keyword={Statistical Machine Learning,Causality and Graphical Models},\n}\n\n
\n
\n\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Direct Learning with Guarantees of the Difference DAG between Structural Equation Models.\n \n \n \n \n\n\n \n Ghoshal, A.; Bello, K.; and Honorio, J.\n\n\n \n\n\n\n ArXiv 1906.12024, 2021.\n \n\n\n\n
\n\n\n\n \n \n \"Direct preprint\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 5 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n
\n
@misc{ghoshal2019direct,\n  title={Direct Learning with Guarantees of the Difference DAG between Structural Equation Models},\n  author={Ghoshal, Asish and Bello, Kevin and Honorio, Jean},\n  howpublished={ArXiv 1906.12024},\n  abstract={Discovering cause-effect relationships between variables from observational data is a fundamental challenge in many scientific disciplines. However, in many situations it is desirable to directly estimate the change in causal relationships across two different conditions, e.g., estimating the change in genetic expression across healthy and diseased subjects can help isolate genetic factors behind the disease. This paper focuses on the problem of directly estimating the structural difference between two structural equation models (SEMs), having the same topological ordering, given two sets of samples drawn from the individual SEMs. We present an principled algorithm that can recover the difference SEM in $O(d^2\\ log(p))$ samples, where $d$ is related to the number of edges in the difference SEM of $p$ nodes. We also study the fundamental limits and show that any method requires at least $\\Omega(d'\\ log(\\frac{p}{d'}))$ samples to learn difference SEMs with at most $d'$ parents per node. Finally, we validate our theoretical results with synthetic experiments and show that our method outperforms the state-of-the-art. Moreover, we show the usefulness of our method by using data from the medical domain.},\n  year={2021},\n  url_preprint={https://arxiv.org/pdf/1906.12024.pdf},\n  keyword={Causality and Graphical Models,Distribution Shifts,Latent Variable Models and Generative Models,Statistical Machine Learning,Interpretability and Fairness},\n}\n\n
\n
\n\n\n
\n Discovering cause-effect relationships between variables from observational data is a fundamental challenge in many scientific disciplines. However, in many situations it is desirable to directly estimate the change in causal relationships across two different conditions, e.g., estimating the change in genetic expression across healthy and diseased subjects can help isolate genetic factors behind the disease. This paper focuses on the problem of directly estimating the structural difference between two structural equation models (SEMs), having the same topological ordering, given two sets of samples drawn from the individual SEMs. We present an principled algorithm that can recover the difference SEM in $O(d^2\\ log(p))$ samples, where $d$ is related to the number of edges in the difference SEM of $p$ nodes. We also study the fundamental limits and show that any method requires at least $Ω(d'\\ log(\\frac{p}{d'}))$ samples to learn difference SEMs with at most $d'$ parents per node. Finally, we validate our theoretical results with synthetic experiments and show that our method outperforms the state-of-the-art. Moreover, we show the usefulness of our method by using data from the medical domain.\n
\n\n\n
\n\n\n\n\n\n
\n
\n\n
\n
\n  \n 2020\n \n \n (2)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n \n Fairness constraints can help exact inference in structured prediction.\n \n \n \n \n\n\n \n Bello, K.; and Honorio, J.\n\n\n \n\n\n\n NeurIPS-20. Advances in Neural Information Processing Systems, 33: 11322–11332. 2020.\n \n\n\n\n
\n\n\n\n \n \n \"Fairness proceedings\n  \n \n \n \"Fairness preprint\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n \n \n\n\n\n
\n
@article{bello2020fairness,\n  title={Fairness constraints can help exact inference in structured prediction},\n  author={Bello, Kevin and Honorio, Jean},\n  journal={<span style="color: #0088cc; font-style: normal">NeurIPS-20.</span> Advances in Neural Information Processing Systems},\n  abstract={Many inference problems in structured prediction can be modeled as maximizing a score function on a space of labels, where graphs are a natural representation to decompose the total score into a sum of unary (nodes) and pairwise (edges) scores. Given a generative model with an undirected connected graph G and true vector of binary labels $\\bar{y}$, it has been previously shown that when G has good expansion properties, such as complete graphs or d-regular expanders, one can exactly recover $\\bar{y}$ (with high probability and in polynomial time) from a single noisy observation of each edge and node. We analyze the previously studied generative model by Globerson et al. (2015) under a notion of statistical parity. That is, given a fair binary node labeling, we ask the question whether it is possible to recover the fair assignment, with high probability and in polynomial time, from single edge and node observations. We find that, in contrast to the known trade-offs between fairness and model performance, the addition of the fairness constraint improves the probability of exact recovery. We effectively explain this phenomenon and empirically show how graphs with poor expansion properties, such as grids, are now capable of achieving exact recovery. Finally, as a byproduct of our analysis, we provide a tighter minimum-eigenvalue bound than that which can be derived from Weyl's inequality.},\n  volume={33},\n  pages={11322--11332},\n  year={2020},\n  url_proceedings={https://proceedings.neurips.cc/paper/2020/hash/8248a99e81e752cb9b41da3fc43fbe7f-Abstract.html},\n  url_preprint={https://arxiv.org/pdf/2007.00218.pdf},\n  keyword={Statistical Machine Learning,Interpretability and Fairness,Causality and Graphical Models},\n}\n\n
\n
\n\n\n
\n Many inference problems in structured prediction can be modeled as maximizing a score function on a space of labels, where graphs are a natural representation to decompose the total score into a sum of unary (nodes) and pairwise (edges) scores. Given a generative model with an undirected connected graph G and true vector of binary labels $ȳ$, it has been previously shown that when G has good expansion properties, such as complete graphs or d-regular expanders, one can exactly recover $ȳ$ (with high probability and in polynomial time) from a single noisy observation of each edge and node. We analyze the previously studied generative model by Globerson et al. (2015) under a notion of statistical parity. That is, given a fair binary node labeling, we ask the question whether it is possible to recover the fair assignment, with high probability and in polynomial time, from single edge and node observations. We find that, in contrast to the known trade-offs between fairness and model performance, the addition of the fairness constraint improves the probability of exact recovery. We effectively explain this phenomenon and empirically show how graphs with poor expansion properties, such as grids, are now capable of achieving exact recovery. Finally, as a byproduct of our analysis, we provide a tighter minimum-eigenvalue bound than that which can be derived from Weyl's inequality.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Minimax Bounds for Structured Prediction Based on Factor Graphs.\n \n \n \n \n\n\n \n Bello, K.; Ghoshal, A.; and Honorio, J.\n\n\n \n\n\n\n AISTATS-20. Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 108: 213–222. 2020.\n \n\n\n\n
\n\n\n\n \n \n \"Minimax proceedings\n  \n \n \n \"Minimax preprint\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n\n\n\n
\n
@article{bello2020minimax,\n  title={Minimax Bounds for Structured Prediction Based on Factor Graphs},\n  author={Bello, Kevin and Ghoshal, Asish and Honorio, Jean},\n  journal={<span style="color: #0088cc; font-style: normal">AISTATS-20.</span> Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics},\n  abstract={Structured prediction can be considered as a generalization of many standard supervised learning tasks, and is usually thought as a simultaneous prediction of multiple labels. One standard approach is to maximize a score function on the space of labels, which usually decomposes as a sum of unary and pairwise potentials, each depending on one or two specific labels, respectively.For this approach, several learning and inference algorithms have been proposed over the years, ranging from exact to approximate methods while balancing the computational complexity.However, in contrast to binary and multiclass classification, results on the necessary number of samples for achieving learning are still limited, even for a specific family of predictors such as factor graphs.In this work, we provide minimax lower bounds for a class of general factor-graph inference models in the context of structured prediction.That is, we characterize the necessary sample complexity for any conceivable algorithm to achieve learning of general factor-graph predictors.},\n  volume={108},\n  pages={213--222},\n  year={2020},\n  publisher={PMLR},\n  url_proceedings={https://proceedings.mlr.press/v108/bello20a.html},\n  url_preprint={https://arxiv.org/pdf/1906.00449.pdf},\n  keyword={Statistical Machine Learning,Causality and Graphical Models},\n}\n\n
\n
\n\n\n
\n Structured prediction can be considered as a generalization of many standard supervised learning tasks, and is usually thought as a simultaneous prediction of multiple labels. One standard approach is to maximize a score function on the space of labels, which usually decomposes as a sum of unary and pairwise potentials, each depending on one or two specific labels, respectively.For this approach, several learning and inference algorithms have been proposed over the years, ranging from exact to approximate methods while balancing the computational complexity.However, in contrast to binary and multiclass classification, results on the necessary number of samples for achieving learning are still limited, even for a specific family of predictors such as factor graphs.In this work, we provide minimax lower bounds for a class of general factor-graph inference models in the context of structured prediction.That is, we characterize the necessary sample complexity for any conceivable algorithm to achieve learning of general factor-graph predictors.\n
\n\n\n
\n\n\n\n\n\n
\n
\n\n
\n
\n  \n 2019\n \n \n (1)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n \n Exact inference in structured prediction.\n \n \n \n \n\n\n \n Bello, K.; and Honorio, J.\n\n\n \n\n\n\n NeurIPS-19. Advances in Neural Information Processing Systems, 32. 2019.\n \n\n\n\n
\n\n\n\n \n \n \"Exact proceedings\n  \n \n \n \"Exact preprint\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n \n \n\n\n\n
\n
@article{bello2019exact,\n  title={Exact inference in structured prediction},\n  author={Bello, Kevin and Honorio, Jean},\n  journal={<span style="color: #0088cc; font-style: normal">NeurIPS-19.</span> Advances in Neural Information Processing Systems},\n  abstract={Structured prediction can be thought of as a simultaneous prediction of multiple labels. This is often done by maximizing a score function on the space of labels, which decomposes as a sum of pairwise and unary potentials. The above is naturally modeled with a graph, where edges and vertices are related to pairwise and unary potentials, respectively. We consider the generative process proposed by Globerson et al. (2015) and apply it to general connected graphs. We analyze the structural conditions of the graph that allow for the exact recovery of the labels. Our results show that exact recovery is possible and achievable in polynomial time for a large class of graphs. In particular, we show that graphs that are bad expanders can be exactly recovered by adding small edge perturbations coming from the Erdos-Renyi model. Finally, as a byproduct of our analysis, we provide an extension of Cheeger's inequality.},\n  volume={32},\n  year={2019},\n  url_proceedings={https://papers.nips.cc/paper_files/paper/2019/hash/312351bff07989769097660a56395065-Abstract.html},\n  url_preprint={https://arxiv.org/pdf/1906.00451.pdf},\n  keyword={Statistical Machine Learning,Optimization,Causality and Graphical Models},\n}\n\n
\n
\n\n\n
\n Structured prediction can be thought of as a simultaneous prediction of multiple labels. This is often done by maximizing a score function on the space of labels, which decomposes as a sum of pairwise and unary potentials. The above is naturally modeled with a graph, where edges and vertices are related to pairwise and unary potentials, respectively. We consider the generative process proposed by Globerson et al. (2015) and apply it to general connected graphs. We analyze the structural conditions of the graph that allow for the exact recovery of the labels. Our results show that exact recovery is possible and achievable in polynomial time for a large class of graphs. In particular, we show that graphs that are bad expanders can be exactly recovered by adding small edge perturbations coming from the Erdos-Renyi model. Finally, as a byproduct of our analysis, we provide an extension of Cheeger's inequality.\n
\n\n\n
\n\n\n\n\n\n
\n
\n\n
\n
\n  \n 2018\n \n \n (3)\n \n \n
\n
\n \n \n
\n \n\n \n \n \n \n \n \n Learning Latent Variable Structured Prediction Models with Gaussian Perturbations.\n \n \n \n \n\n\n \n Bello, K.; and Honorio, J.\n\n\n \n\n\n\n NeurIPS-18. Advances in Neural Information Processing Systems, 31. 2018.\n \n\n\n\n
\n\n\n\n \n \n \"Learning proceedings\n  \n \n \n \"Learning preprint\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n\n\n\n
\n
@article{bello2018learning,\n  title={Learning Latent Variable Structured Prediction Models with Gaussian Perturbations},\n  author={Bello, Kevin and Honorio, Jean},\n  journal={<span style="color: #0088cc; font-style: normal">NeurIPS-18.</span> Advances in Neural Information Processing Systems},\n  abstract={The standard margin-based structured prediction commonly uses a maximum loss over all possible structured outputs. The large-margin formulation including latent variables not only results in a non-convex formulation but also increases the search space by a factor of the size of the latent space. Recent work has proposed the use of the maximum loss over random structured outputs sampled independently from some proposal distribution, with theoretical guarantees. We extend this work by including latent variables. We study a new family of loss functions under Gaussian perturbations and analyze the effect of the latent space on the generalization bounds. We show that the non-convexity of learning with latent variables originates naturally, as it relates to a tight upper bound of the Gibbs decoder distortion with respect to the latent space. Finally, we provide a formulation using random samples and relaxations that produces a tighter upper bound of the Gibbs decoder distortion up to a statistical accuracy, which enables a polynomial time evaluation of the objective function. We illustrate the method with synthetic experiments and a computer vision application.},\n  volume={31},\n  year={2018},\n  url_Proceedings={https://papers.nips.cc/paper_files/paper/2018/hash/f18a6d1cde4b205199de8729a6637b42-Abstract.html},\n  url_Preprint={https://arxiv.org/pdf/1805.09213.pdf},\n  keyword={Latent Variable Models and Generative Models,Statistical Machine Learning},\n}\n\n
\n
\n\n\n
\n The standard margin-based structured prediction commonly uses a maximum loss over all possible structured outputs. The large-margin formulation including latent variables not only results in a non-convex formulation but also increases the search space by a factor of the size of the latent space. Recent work has proposed the use of the maximum loss over random structured outputs sampled independently from some proposal distribution, with theoretical guarantees. We extend this work by including latent variables. We study a new family of loss functions under Gaussian perturbations and analyze the effect of the latent space on the generalization bounds. We show that the non-convexity of learning with latent variables originates naturally, as it relates to a tight upper bound of the Gibbs decoder distortion with respect to the latent space. Finally, we provide a formulation using random samples and relaxations that produces a tighter upper bound of the Gibbs decoder distortion up to a statistical accuracy, which enables a polynomial time evaluation of the objective function. We illustrate the method with synthetic experiments and a computer vision application.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Computationally and statistically efficient learning of causal Bayes nets using path queries.\n \n \n \n \n\n\n \n Bello, K.; and Honorio, J.\n\n\n \n\n\n\n NeurIPS-18. Advances in Neural Information Processing Systems, 31. 2018.\n \n\n\n\n
\n\n\n\n \n \n \"Computationally proceedings\n  \n \n \n \"Computationally preprint\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 3 downloads\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n\n\n\n
\n
@article{bello2018computationally,\n  title={Computationally and statistically efficient learning of causal Bayes nets using path queries},\n  author={Bello, Kevin and Honorio, Jean},\n  journal={<span style="color: #0088cc; font-style: normal">NeurIPS-18.</span> Advances in Neural Information Processing Systems},\n  abstract={Causal discovery from empirical data is a fundamental problem in many scientific domains. Observational data allows for identifiability only up to Markov equivalence class. In this paper we first propose a polynomial time algorithm for learning the exact correctly-oriented structure of the transitive reduction of any causal Bayesian network with high probability, by using interventional path queries. Each path query takes as input an origin node and a target node, and answers whether there is a directed path from the origin to the target. This is done by intervening on the origin node and observing samples from the target node. We theoretically show the logarithmic sample complexity for the size of interventional data per path query, for continuous and discrete networks. We then show how to learn the transitive edges using also logarithmic sample complexity (albeit in time exponential in the maximum number of parents for discrete networks), which allows us to learn the full network. We further extend our work by reducing the number of interventional path queries for learning rooted trees. We also provide an analysis of imperfect interventions.},\n  volume={31},\n  year={2018},\n  url_Proceedings={https://proceedings.neurips.cc/paper_files/paper/2018/hash/a0b45d1bb84fe1bedbb8449764c4d5d5-Abstract.html},\n  url_Preprint={https://arxiv.org/pdf/1706.00754v4.pdf},\n  keyword={Causality and Graphical Models,Statistical Machine Learning},\n}\n\n
\n
\n\n\n
\n Causal discovery from empirical data is a fundamental problem in many scientific domains. Observational data allows for identifiability only up to Markov equivalence class. In this paper we first propose a polynomial time algorithm for learning the exact correctly-oriented structure of the transitive reduction of any causal Bayesian network with high probability, by using interventional path queries. Each path query takes as input an origin node and a target node, and answers whether there is a directed path from the origin to the target. This is done by intervening on the origin node and observing samples from the target node. We theoretically show the logarithmic sample complexity for the size of interventional data per path query, for continuous and discrete networks. We then show how to learn the transitive edges using also logarithmic sample complexity (albeit in time exponential in the maximum number of parents for discrete networks), which allows us to learn the full network. We further extend our work by reducing the number of interventional path queries for learning rooted trees. We also provide an analysis of imperfect interventions.\n
\n\n\n
\n\n\n
\n \n\n \n \n \n \n \n \n Improving Topic Coherence Using Entity Extraction Denoising.\n \n \n \n \n\n\n \n Cardenas, R.; Bello, K.; Coronado, A.; and Villota, E.\n\n\n \n\n\n\n Prague Bulletin of Mathematical Linguistics, 110: 85–101. 2018.\n \n\n\n\n
\n\n\n\n \n \n \"Improving proceedings\n  \n \n\n \n\n \n link\n  \n \n\n bibtex\n \n\n \n  \n \n abstract \n \n\n \n  \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n  \n \n \n \n \n \n \n\n\n\n
\n
@article{cardenas2018improving,\n  title={Improving Topic Coherence Using Entity Extraction Denoising.},\n  author={Cardenas, Ronald and Bello, Kevin and Coronado, Alberto and Villota, Elizabeth},\n  journal={Prague Bulletin of Mathematical Linguistics},\n  abstract={Managing large collections of documents is an important problem for many areas of science, industry, and culture. Probabilistic topic modeling offers a promising solution. Topic modeling is an unsupervised machine learning method and the evaluation of this model is an interesting problem on its own. Topic interpretability measures have been developed in recent years as a more natural option for topic quality evaluation, emulating human perception of coherence with word sets correlation scores. In this paper, we show experimental evidence of the improvement of topic coherence score by restricting the training corpus to that of relevant information in the document obtained by Entity Recognition. We experiment with job advertisement data and find that with this approach topic models improve interpretability in about 40 percentage points on average. Our analysis reveals as well that using the extracted text chunks, some redundant topics are joined while others are split into more skill-specific topics. Fine-grained topics observed in models using the whole text are preserved.},\n  volume={110},\n  pages={85--101},\n  year={2018},\n  url_Proceedings={https://ufal.mff.cuni.cz/pbml/110/art-cardenas-et-al.pdf},\n  keyword={Latent Variable Models and Generative Models,Applications},\n}\n
\n
\n\n\n
\n Managing large collections of documents is an important problem for many areas of science, industry, and culture. Probabilistic topic modeling offers a promising solution. Topic modeling is an unsupervised machine learning method and the evaluation of this model is an interesting problem on its own. Topic interpretability measures have been developed in recent years as a more natural option for topic quality evaluation, emulating human perception of coherence with word sets correlation scores. In this paper, we show experimental evidence of the improvement of topic coherence score by restricting the training corpus to that of relevant information in the document obtained by Entity Recognition. We experiment with job advertisement data and find that with this approach topic models improve interpretability in about 40 percentage points on average. Our analysis reveals as well that using the extracted text chunks, some redundant topics are joined while others are split into more skill-specific topics. Fine-grained topics observed in models using the whole text are preserved.\n
\n\n\n
\n\n\n\n\n\n
\n
\n\n\n\n\n
\n\n\n \n\n \n \n \n \n\n
\n"}; document.write(bibbase_data.data);