\n \n \n
\n
\n\n \n \n \n \n \n \n More data means less inference: A pseudo-max approach to structured learning.\n \n \n \n \n\n\n \n Sontag, D.; Meshi, O.; Jaakkola, T.; and Globerson, A.\n\n\n \n\n\n\n In Lafferty, J.; Williams, C.; Shawe-Taylor, J.; Zemel, R.; and Culotta, A., editor(s),
Advances in Neural Information Processing Systems 23, pages 2181–2189. MIT Press, 2010.\n
\n\n
\n\n
\n\n
\n\n \n \n paper\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n\n\n\n
\n
@incollection{SonMesJaaGlo_nips10,\n author = {David Sontag and Ofer Meshi and Tommi Jaakkola and Amir Globerson},\n title = {More data means less inference: A pseudo-max approach to structured learning},\n booktitle = {Advances in Neural Information Processing Systems 23},\n editor = {J. Lafferty and C.K.I. Williams and J. Shawe-Taylor and R.S. Zemel and A. Culotta},\n pages = {2181--2189},\n publisher = {MIT Press},\n year = {2010},\n keywords = {Machine learning, Structured prediction},\n url_Paper = {http://people.csail.mit.edu/dsontag/papers/SonMesJaaGlo_nips10.pdf},\n abstract = {The problem of learning to predict structured labels is of key importance in many applications. However, for general graph structure both learning and inference are intractable. Here we show that it is possible to circumvent this difficulty when the distribution of training examples is rich enough, via a method similar in spirit to pseudo-likelihood. We show that our new method achieves consistency, and illustrate empirically that it indeed approaches the performance of exact methods when sufficiently large training sets are used.}\n}\n\n
\n
\n\n\n
\n The problem of learning to predict structured labels is of key importance in many applications. However, for general graph structure both learning and inference are intractable. Here we show that it is possible to circumvent this difficulty when the distribution of training examples is rich enough, via a method similar in spirit to pseudo-likelihood. We show that our new method achieves consistency, and illustrate empirically that it indeed approaches the performance of exact methods when sufficiently large training sets are used.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing.\n \n \n \n \n\n\n \n Rush, A. M.; Sontag, D.; Collins, M.; and Jaakkola, T.\n\n\n \n\n\n\n In
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1-11, 2010. \n
\n\n
\n\n
\n\n
\n\n \n \n paper\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n\n\n\n
\n
@inproceedings{RusSonColJaa_emnlp10,\n author = {Alexander M. Rush and David Sontag and Michael Collins and Tommi Jaakkola},\n title = {On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing},\n booktitle = {Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP)},\n pages = {1-11},\n year = {2010},\n keywords = {Machine learning, Natural language processing, Approximate inference in graphical models},\n url_Paper = {http://people.csail.mit.edu/dsontag/papers/RusSonColJaa_emnlp10.pdf},\n abstract = {This paper introduces dual decomposition as a framework for deriving inference algorithms for NLP problems. The approach relies on standard dynamic-programming algorithms as oracle solvers for sub-problems, together with a simple method for forcing agreement between the different oracles. The approach provably solves a linear programming (LP) relaxation of the global inference problem. It leads to algorithms that are simple, in that they use existing decoding algorithms; efficient, in that they avoid exact algorithms for the full model; and often exact, in that empirically they often recover the correct solution in spite of using an LP relaxation. We give experimental results on two problems: 1) the combination of two lexicalized parsing models; and 2) the combination of a lexicalized parsing model and a trigram part-of-speech tagger.}\n}\n\n
\n
\n\n\n
\n This paper introduces dual decomposition as a framework for deriving inference algorithms for NLP problems. The approach relies on standard dynamic-programming algorithms as oracle solvers for sub-problems, together with a simple method for forcing agreement between the different oracles. The approach provably solves a linear programming (LP) relaxation of the global inference problem. It leads to algorithms that are simple, in that they use existing decoding algorithms; efficient, in that they avoid exact algorithms for the full model; and often exact, in that empirically they often recover the correct solution in spite of using an LP relaxation. We give experimental results on two problems: 1) the combination of two lexicalized parsing models; and 2) the combination of a lexicalized parsing model and a trigram part-of-speech tagger.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Dual Decomposition for Parsing with Non-Projective Head Automata.\n \n \n \n \n\n\n \n Koo, T.; Rush, A. M.; Collins, M.; Jaakkola, T.; and Sontag, D.\n\n\n \n\n\n\n In
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1288-1298, 2010. \n
\n\n
\n\n
\n\n
\n\n \n \n paper\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n\n\n\n
\n
@inproceedings{KooEtAl_emnlp10,\n author = {Terry Koo and Alexander M. Rush and Michael Collins and Tommi Jaakkola and David Sontag},\n title = {Dual Decomposition for Parsing with Non-Projective Head Automata},\n booktitle = {Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP)},\n pages = {1288-1298},\n year = {2010},\n keywords = {Machine learning, Natural language processing, Approximate inference in graphical models},\n url_Paper = {http://people.csail.mit.edu/dsontag/papers/KooEtAl_emnlp10.pdf},\n abstract = {This paper introduces algorithms for non-projective parsing based on dual decomposition. We focus on parsing algorithms for non-projective head automata, a generalization of head-automata models to non-projective structures. The dual decomposition algorithms are simple and efficient, relying on standard dynamic programming and minimum spanning tree algorithms. They provably solve an LP relaxation of the non-projective parsing problem. Empirically the LP relaxation is very often tight: for many languages, exact solutions are achieved on over 98\\% of test sentences. The accuracy of our models is higher than previous work on a broad range of datasets.}\n}\n\n
\n
\n\n\n
\n This paper introduces algorithms for non-projective parsing based on dual decomposition. We focus on parsing algorithms for non-projective head automata, a generalization of head-automata models to non-projective structures. The dual decomposition algorithms are simple and efficient, relying on standard dynamic programming and minimum spanning tree algorithms. They provably solve an LP relaxation of the non-projective parsing problem. Empirically the LP relaxation is very often tight: for many languages, exact solutions are achieved on over 98% of test sentences. The accuracy of our models is higher than previous work on a broad range of datasets.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Approximate Inference in Graphical Models using LP Relaxations.\n \n \n \n \n\n\n \n Sontag, D.\n\n\n \n\n\n\n Ph.D. Thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2010.\n
\n\n
\n\n
\n\n
\n\n \n \n paper\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n\n\n\n
\n
@phdthesis{Sontag_thesis10,\n title = {Approximate Inference in Graphical Models using LP Relaxations},\n author = {David Sontag},\n school = {Massachusetts Institute of Technology},\n address = {Department of Electrical Engineering and Computer Science},\n year = {2010},\n keywords = {Machine learning, Approximate inference in graphical models},\n url_Paper = {http://people.csail.mit.edu/dsontag/papers/sontag_phd_thesis.pdf},\n abstract = {Graphical models such as Markov random fields have been successfully applied to a wide variety of fields, from computer vision and natural language processing, to computational biology. Exact probabilistic inference is generally intractable in complex models having many dependencies between the variables. We present new approaches to approximate inference based on linear programming (LP) relaxations. Our algorithms optimize over the cycle relaxation of the marginal polytope, which we show to be closely related to the first lifting of the Sherali-Adams hierarchy, and is significantly tighter than the pairwise LP relaxation. We show how to efficiently optimize over the cycle relaxation using a cutting-plane algorithm that iteratively introduces constraints into the relaxation. We provide a criterion to determine which constraints would be most helpful in tightening the relaxation, and give efficient algorithms for solving the search problem of finding the best cycle constraint to add according to this criterion. By solving the LP relaxations in the dual, we obtain efficient message-passing algorithms that, when the relaxations are tight, can provably find the most likely (MAP) configuration. Our algorithms succeed at finding the MAP configuration in protein side-chain placement, protein design, and stereo vision problems.}\n}\n\n
\n
\n\n\n
\n Graphical models such as Markov random fields have been successfully applied to a wide variety of fields, from computer vision and natural language processing, to computational biology. Exact probabilistic inference is generally intractable in complex models having many dependencies between the variables. We present new approaches to approximate inference based on linear programming (LP) relaxations. Our algorithms optimize over the cycle relaxation of the marginal polytope, which we show to be closely related to the first lifting of the Sherali-Adams hierarchy, and is significantly tighter than the pairwise LP relaxation. We show how to efficiently optimize over the cycle relaxation using a cutting-plane algorithm that iteratively introduces constraints into the relaxation. We provide a criterion to determine which constraints would be most helpful in tightening the relaxation, and give efficient algorithms for solving the search problem of finding the best cycle constraint to add according to this criterion. By solving the LP relaxations in the dual, we obtain efficient message-passing algorithms that, when the relaxations are tight, can provably find the most likely (MAP) configuration. Our algorithms succeed at finding the MAP configuration in protein side-chain placement, protein design, and stereo vision problems.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Learning Efficiently with Approximate Inference via Dual Losses.\n \n \n \n \n\n\n \n Meshi, O.; Sontag, D.; Jaakkola, T.; and Globerson, A.\n\n\n \n\n\n\n In Furnkranz, J.; and Joachims, T., editor(s),
Proceedings of the 27th International Conference on Machine Learning (ICML), pages 783-790, 2010. Omnipress\n
\n\n
\n\n
\n\n
\n\n \n \n paper\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n\n\n\n
\n
@inproceedings{MesSonJaaGlo_icml10,\n title = {Learning Efficiently with Approximate Inference via Dual Losses},\n author = {Ofer Meshi and David Sontag and Tommi Jaakkola and Amir Globerson},\n booktitle = {Proceedings of the 27th International Conference on Machine Learning (ICML)},\n pages = {783-790},\n editor = {Johannes Furnkranz and Thorsten Joachims},\n publisher = {Omnipress},\n year = {2010},\n keywords = {Machine learning, Structured prediction},\n url_Paper = {http://people.csail.mit.edu/dsontag/papers/MesSonJaaGlo_icml10.pdf},\n abstract = {Many structured prediction tasks involve complex models where inference is computationally intractable, but where it can be well approximated using a linear programming relaxation. Previous approaches for learning for structured prediction (e.g., cutting-plane, subgradient methods, perceptron) repeatedly make predictions for some of the data points. These approaches are computationally demanding because each prediction involves solving a linear program to optimality. We present a scalable algorithm for learning for structured prediction. The main idea is to instead solve the dual of the structured prediction loss. We formulate the learning task as a convex minimization over both the weights and the dual variables corresponding to each data point. As a result, we can begin to optimize the weights even before completely solving any of the individual prediction problems. We show how the dual variables can be efficiently optimized using coordinate descent. Our algorithm is competitive with state-of-the-art methods such as stochastic subgradient and cutting-plane.}\n}\n\n
\n
\n\n\n
\n Many structured prediction tasks involve complex models where inference is computationally intractable, but where it can be well approximated using a linear programming relaxation. Previous approaches for learning for structured prediction (e.g., cutting-plane, subgradient methods, perceptron) repeatedly make predictions for some of the data points. These approaches are computationally demanding because each prediction involves solving a linear program to optimality. We present a scalable algorithm for learning for structured prediction. The main idea is to instead solve the dual of the structured prediction loss. We formulate the learning task as a convex minimization over both the weights and the dual variables corresponding to each data point. As a result, we can begin to optimize the weights even before completely solving any of the individual prediction problems. We show how the dual variables can be efficiently optimized using coordinate descent. Our algorithm is competitive with state-of-the-art methods such as stochastic subgradient and cutting-plane.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Learning Bayesian Network Structure using LP Relaxations.\n \n \n \n \n\n\n \n Jaakkola, T.; Sontag, D.; Globerson, A.; and Meila, M.\n\n\n \n\n\n\n In
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AI-STATS), volume 9, pages 358-365, 2010. JMLR: W&CP\n
\n\n
\n\n
\n\n
\n\n \n \n paper\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n\n\n\n
\n
@inproceedings{JaaSonGloMei_aistats10,\n title = {Learning {B}ayesian Network Structure using {LP} Relaxations},\n author = {Tommi Jaakkola and David Sontag and Amir Globerson and Marina Meila},\n booktitle = {Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AI-STATS)},\n publisher = {JMLR: W\\&CP},\n volume = {9},\n pages = {358-365},\n year = {2010},\n keywords = {Machine learning, Bayesian network structure learning},\n url_Paper = {http://people.csail.mit.edu/dsontag/papers/structure_aistats10.pdf},\n abstract = {We propose to solve the combinatorial problem of finding the highest scoring Bayesian network structure from data. This structure learning problem can be viewed as an inference problem where the variables specify the choice of parents for each node in the graph. The key combinatorial difficulty arises from the global constraint that the graph structure has to be acyclic. We cast the structure learning problem as a linear program over the polytope defined by valid acyclic structures. In relaxing this problem, we maintain an outer bound approximation to the polytope and iteratively tighten it by searching over a new class of valid constraints. If an integral solution is found, it is guaranteed to be the optimal Bayesian network. When the relaxation is not tight, the fast dual algorithms we develop remain useful in combination with a branch and bound method. Empirical results suggest that the method is competitive or faster than alternative exact methods based on dynamic programming.}\n}\n\n
\n
\n\n\n
\n We propose to solve the combinatorial problem of finding the highest scoring Bayesian network structure from data. This structure learning problem can be viewed as an inference problem where the variables specify the choice of parents for each node in the graph. The key combinatorial difficulty arises from the global constraint that the graph structure has to be acyclic. We cast the structure learning problem as a linear program over the polytope defined by valid acyclic structures. In relaxing this problem, we maintain an outer bound approximation to the polytope and iteratively tighten it by searching over a new class of valid constraints. If an integral solution is found, it is guaranteed to be the optimal Bayesian network. When the relaxation is not tight, the fast dual algorithms we develop remain useful in combination with a branch and bound method. Empirical results suggest that the method is competitive or faster than alternative exact methods based on dynamic programming.\n
\n\n\n
\n\n\n\n\n\n