Action recognition via bio-inspired features: The richness of center–surround interaction. Escobar, M. & Kornprobst, P. Computer Vision and Image Understanding, 116(5):593--605, May, 2012. 00017
Action recognition via bio-inspired features: The richness of center–surround interaction [link]Paper  doi  abstract   bibtex   
Motion is a key feature for a wide class of computer vision approaches to recognize actions. In this article, we show how to define bio-inspired features for action recognition. To do so, we start from a well-established bio-inspired motion model of cortical areas V1 and MT. The primary visual cortex, designated as V1, is the first cortical area encountered in the visual stream processing and early responses of V1 cells consist in tiled sets of selective spatiotemporal filters. The second cortical area of interest in this article is area MT where MT cells pool incoming information from V1 according to the shape and characteristic of their receptive field. To go beyond the classical models and following the observations from Xiao et al. [61], we propose here to model different surround geometries for MT cells receptive fields. Then, we define the so-called bio-inspired features associated to an input video, based on the average activity of MT cells. Finally, we show how these features can be used in a standard classification method to perform action recognition. Results are given for the Weizmann and KTH databases. Interestingly, we show that the diversity of motion representation at the MT level (different surround geometries), is a major advantage for action recognition. On the Weizmann database, the inclusion of different MT surround geometries improved the recognition rate from 63.01 ± 2.07% up to 99.26 ± 1.66% in the best case. Similarly, on the KTH database, the recognition rate was significantly improved with the inclusion of MT different surround geometries (from 47.82 ± 2.71% up to 92.44 ± 0.01% in the best case). We also discussed the limitations of the current approach which are closely related to the input video duration. These promising results encourage us to further develop bio-inspired models incorporating other brain mechanisms and cortical areas in order to deal with more complex videos.
@article{ escobar_action_2012,
  title = {Action recognition via bio-inspired features: {The} richness of center–surround interaction},
  volume = {116},
  issn = {1077-3142},
  shorttitle = {Action recognition via bio-inspired features},
  url = {http://www.sciencedirect.com/science/article/pii/S1077314212000185},
  doi = {10.1016/j.cviu.2012.01.002},
  abstract = {Motion is a key feature for a wide class of computer vision approaches to recognize actions. In this article, we show how to define bio-inspired features for action recognition. To do so, we start from a well-established bio-inspired motion model of cortical areas V1 and MT. The primary visual cortex, designated as V1, is the first cortical area encountered in the visual stream processing and early responses of V1 cells consist in tiled sets of selective spatiotemporal filters. The second cortical area of interest in this article is area MT where MT cells pool incoming information from V1 according to the shape and characteristic of their receptive field. To go beyond the classical models and following the observations from Xiao et al. [61], we propose here to model different surround geometries for MT cells receptive fields. Then, we define the so-called bio-inspired features associated to an input video, based on the average activity of MT cells. Finally, we show how these features can be used in a standard classification method to perform action recognition. Results are given for the Weizmann and KTH databases. Interestingly, we show that the diversity of motion representation at the MT level (different surround geometries), is a major advantage for action recognition. On the Weizmann database, the inclusion of different MT surround geometries improved the recognition rate from 63.01 ± 2.07% up to 99.26 ± 1.66% in the best case. Similarly, on the KTH database, the recognition rate was significantly improved with the inclusion of MT different surround geometries (from 47.82 ± 2.71% up to 92.44 ± 0.01% in the best case). We also discussed the limitations of the current approach which are closely related to the input video duration. These promising results encourage us to further develop bio-inspired models incorporating other brain mechanisms and cortical areas in order to deal with more complex videos.},
  number = {5},
  urldate = {2014-05-24TZ},
  journal = {Computer Vision and Image Understanding},
  author = {Escobar, María-José and Kornprobst, Pierre},
  month = {May},
  year = {2012},
  note = {00017},
  pages = {593--605}
}

Downloads: 0