The Use of Attention and Spatial Information for Rapid Facial Recognition in Video. Bonaiuto, J. & Itti, L. Image and Vision Computing, 24(6):557-563, Jun, 2006.
abstract   bibtex   
Bottom-up visual attention is the process by which primates quickly select regions of an image likely to contain behaviorally relevant objects. In artificial systems, restricting the task of object recognition to these regions allows faster recognition and unsupervised learning of multiple objects in cluttered scenes. A problem with this approach is that often objects that are superficially dissimilar to the target are given the same consideration in recognition as similar objects. Additionally, in video, objects recognized in previous frames at locations distant to the current fixation point often are given the same consideration in recognition as objects previously recognized at proximal locations. Here we investigate the value of rapidly pruning the facial recognition search space, first using similarity in the already-computed low-level features that guide attention to prioritize matching against an object database, and, second, using spatial proximity information derived from previous video frames. By comparing the performance of Lowe's recognition algorithm with Itti \& Koch's bottom-up attention model with and without search space pruning, we demonstrate that this approach significantly accelerates facial recognition in video footage.
@article{ Bonaiuto_Itti06ivc,
  author = {J. Bonaiuto and L. Itti},
  title = {The Use of Attention and Spatial Information for Rapid Facial Recognition in Video},
  abstract = {Bottom-up visual attention is the process by which primates
quickly select regions of an image likely to contain behaviorally
relevant objects. In artificial systems, restricting the task of
object recognition to these regions allows faster recognition and
unsupervised learning of multiple objects in cluttered scenes. A
problem with this approach is that often objects that are
superficially dissimilar to the target are given the same
consideration in recognition as similar objects. Additionally, in
video, objects recognized in previous frames at locations distant to
the current fixation point often are given the same consideration in
recognition as objects previously recognized at proximal locations.
Here we investigate the value of rapidly pruning the facial
recognition search space, first using similarity in the
already-computed low-level features that guide attention to prioritize
matching against an object database, and, second, using spatial
proximity information derived from previous video frames.  By
comparing the performance of Lowe's recognition algorithm with Itti \&
Koch's bottom-up attention model with and without search space
pruning, we demonstrate that this approach significantly accelerates
facial recognition in video footage.},
  journal = {Image and Vision Computing},
  volume = {24},
  number = {6},
  pages = {557-563},
  month = {Jun},
  year = {2006},
  file = {http://ilab.usc.edu/publications/doc/Bonaiuto_Itti06ivc.pdf},
  type = {bu ; cv},
  if = {2004 impact factor: 1.159}
}

Downloads: 0