Retina enhanced bag of words descriptors for video classification. Strat, S. T., Benoit, A., & Lambert, P. In 2014 22nd European Signal Processing Conference (EUSIPCO), pages 1307-1311, Sep., 2014.
Retina enhanced bag of words descriptors for video classification [pdf]Paper  abstract   bibtex   
This paper addresses the task of detecting diverse semantic concepts in videos. Within this context, the Bag Of Visual Words (BoW) model, inherited from sampled video keyframes analysis, is among the most popular methods. However, in the case of image sequences, this model faces new difficulties such as the added motion information, the extra computational cost and the increased variability of content and concepts to handle. Considering this spatio-temporal context, we propose to extend the BoW model by introducing video preprocessing strategies with the help of a retina model, before extracting BoW descriptors. This preprocessing increases the robustness of local features to disturbances such as noise and lighting variations. Additionally, the retina model is used to detect potentially salient areas and to construct spatio-temporal descriptors. We experiment with three state of the art local features, SIFT, SURF and FREAK, and we evaluate our results on the TRECVid 2012 Semantic Indexing (SIN) challenge.

Downloads: 0