Discriminative Hierarchical Modeling of Spatio-Temporally Composable Human Activities

Discriminative Hierarchical Modeling of Spatio-Temporally Composable Human Activities. Lillo, I., Niebles, J., & Soto, A. In CVPR, 2014.

Paper abstract bibtex 33 downloads

This paper proposes a framework for recognizing complex human activities in videos. Our method describes human activities in a hierarchical discriminative model that operates at three semantic levels. At the lower level, body poses are encoded in a representative but discriminative pose dictionary. At the intermediate level, encoded poses span a space where simple human actions are composed. At the highest level, our model captures temporal and spatial compositions of actions into complex human activities. Our human activity classifier simultaneously models which body parts are relevant to the action of interest as well as their appearance and composition using a discriminative approach. By formulating model learning in a max-margin framework, our approach achieves powerful multi-class discrimination while providing useful annotations at the intermediate semantic level. We show how our hierarchical compositional model provides natural handling of occlusions. To evaluate the effectiveness of our proposed framework, we introduce a new dataset of composed human activities. We provide empirical evidence that our method achieves state-of-the-art activity classification performance on several benchmark datasets.

@InProceedings{	  lillo:etal:2014,
  author	= {I. Lillo and JC. Niebles and A. Soto},
  title		= {Discriminative Hierarchical Modeling of Spatio-Temporally
		  Composable Human Activities},
  booktitle	= {{CVPR}},
  year		= {2014},
  abstract	= {This paper proposes a framework for recognizing complex
		  human activities in videos. Our method describes human
		  activities in a hierarchical discriminative model that
		  operates at three semantic levels. At the lower level, body
		  poses are encoded in a representative but discriminative
		  pose dictionary. At the intermediate level, encoded poses
		  span a space where simple human actions are composed. At
		  the highest level, our model captures temporal and spatial
		  compositions of actions into complex human activities. Our
		  human activity classifier simultaneously models which body
		  parts are relevant to the action of interest as well as
		  their appearance and composition using a discriminative
		  approach. By formulating model learning in a max-margin
		  framework, our approach achieves powerful multi-class
		  discrimination while providing useful annotations at the
		  intermediate semantic level. We show how our hierarchical
		  compositional model provides natural handling of
		  occlusions. To evaluate the effectiveness of our proposed
		  framework, we introduce a new dataset of composed human
		  activities. We provide empirical evidence that our method
		  achieves state-of-the-art activity classification
		  performance on several benchmark datasets.},
  url		= {http://saturno.ing.puc.cl/media/papers_alvaro/activities-CVPR-14.pdf}
}

Downloads: 33