Relational Long Short-Term Memory for Video Action Recognition

Relational Long Short-Term Memory for Video Action Recognition. Chen, Z., Ramachandra, B., Wu, T., & Vatsavai, R. R. CoRR, 2018.

Paper abstract bibtex

Spatial and temporal relationships, both short-range and long-range, between objects in videos are key cues for recognizing actions. It is a challenging problem to model them jointly. In this paper, we first present a new variant of Long Short-Term Memory, namely Relational LSTM to address the challenge for relation reasoning across space and time between objects. In our Relational LSTM module, we utilize a non-local operation similar in spirit to the recently proposed non-local network to substitute the fully connected operation in the vanilla LSTM. By doing this, our Relational LSTM is capable of capturing long and short-range spatio-temporal relations between objects in videos in a principled way. Then, we propose a two-branch neural architecture consisting of the Relational LSTM module as the non-local branch and a spatio-temporal pooling based local branch. The local branch is introduced for capturing local spatial appearance and/or short-term motion features. The two-branch modules are concatenated to learn video-level features from snippet-level ones end-to-end. Experimental results on UCF-101 and HMDB-51 datasets show that our model achieves state-of-the-art results among LSTM-based methods, while obtaining comparable performance with other state-of-the-art methods (which use not directly comparable schema). Our code will be released.

@Article{RelationalLSTM,
  author        = {Zexi Chen and Bharathkumar Ramachandra and Tianfu Wu and Ranga Raju Vatsavai},
  title         = {Relational Long Short-Term Memory for Video Action Recognition},
  journal       = {CoRR},
  year          = {2018},
  volume        = {abs/1811.07059},
  abstract      = {Spatial and temporal relationships, both short-range and long-range, between objects in videos are key cues for recognizing actions. It is a challenging problem to model them jointly. In this paper, we first present a new variant of Long Short-Term Memory, namely Relational LSTM to address the challenge for relation reasoning across space and time between objects. In our Relational LSTM module, we utilize a non-local operation similar in spirit to the recently proposed non-local network to substitute the fully connected operation in the vanilla LSTM. By doing this, our Relational LSTM is capable of capturing long and short-range spatio-temporal relations between objects in videos in a principled way. Then, we propose a two-branch neural architecture consisting of the Relational LSTM module as the non-local branch and a spatio-temporal pooling based local branch. The local branch is introduced for capturing local spatial appearance and/or short-term motion features. The two-branch modules are concatenated to learn video-level features from snippet-level ones end-to-end. Experimental results on UCF-101 and HMDB-51 datasets show that our model achieves state-of-the-art results among LSTM-based methods, while obtaining comparable performance with other state-of-the-art methods (which use not directly comparable schema). Our code will be released.},
  url_paper     = {https://arxiv.org/pdf/1811.07059.pdf}  
}

Downloads: 0

{"_id":"gBgzhDgfF6WjfiFZt","bibbaseid":"chen-ramachandra-wu-vatsavai-relationallongshorttermmemoryforvideoactionrecognition-2018","downloads":0,"creationDate":"2018-12-06T18:52:27.837Z","title":"Relational Long Short-Term Memory for Video Action Recognition","author_short":["Chen, Z.","Ramachandra, B.","Wu, T.","Vatsavai, R. R."],"year":2018,"bibtype":"article","biburl":"https://tfwu.github.io/TianfuWu_BibTex.bib","bibdata":{"bibtype":"article","type":"article","author":[{"firstnames":["Zexi"],"propositions":[],"lastnames":["Chen"],"suffixes":[]},{"firstnames":["Bharathkumar"],"propositions":[],"lastnames":["Ramachandra"],"suffixes":[]},{"firstnames":["Tianfu"],"propositions":[],"lastnames":["Wu"],"suffixes":[]},{"firstnames":["Ranga","Raju"],"propositions":[],"lastnames":["Vatsavai"],"suffixes":[]}],"title":"Relational Long Short-Term Memory for Video Action Recognition","journal":"CoRR","year":"2018","volume":"abs/1811.07059","abstract":"Spatial and temporal relationships, both short-range and long-range, between objects in videos are key cues for recognizing actions. It is a challenging problem to model them jointly. In this paper, we first present a new variant of Long Short-Term Memory, namely Relational LSTM to address the challenge for relation reasoning across space and time between objects. In our Relational LSTM module, we utilize a non-local operation similar in spirit to the recently proposed non-local network to substitute the fully connected operation in the vanilla LSTM. By doing this, our Relational LSTM is capable of capturing long and short-range spatio-temporal relations between objects in videos in a principled way. Then, we propose a two-branch neural architecture consisting of the Relational LSTM module as the non-local branch and a spatio-temporal pooling based local branch. The local branch is introduced for capturing local spatial appearance and/or short-term motion features. The two-branch modules are concatenated to learn video-level features from snippet-level ones end-to-end. Experimental results on UCF-101 and HMDB-51 datasets show that our model achieves state-of-the-art results among LSTM-based methods, while obtaining comparable performance with other state-of-the-art methods (which use not directly comparable schema). Our code will be released.","url_paper":"https://arxiv.org/pdf/1811.07059.pdf","bibtex":"@Article{RelationalLSTM,\n author = {Zexi Chen and Bharathkumar Ramachandra and Tianfu Wu and Ranga Raju Vatsavai},\n title = {Relational Long Short-Term Memory for Video Action Recognition},\n journal = {CoRR},\n year = {2018},\n volume = {abs/1811.07059},\n abstract = {Spatial and temporal relationships, both short-range and long-range, between objects in videos are key cues for recognizing actions. It is a challenging problem to model them jointly. In this paper, we first present a new variant of Long Short-Term Memory, namely Relational LSTM to address the challenge for relation reasoning across space and time between objects. In our Relational LSTM module, we utilize a non-local operation similar in spirit to the recently proposed non-local network to substitute the fully connected operation in the vanilla LSTM. By doing this, our Relational LSTM is capable of capturing long and short-range spatio-temporal relations between objects in videos in a principled way. Then, we propose a two-branch neural architecture consisting of the Relational LSTM module as the non-local branch and a spatio-temporal pooling based local branch. The local branch is introduced for capturing local spatial appearance and/or short-term motion features. The two-branch modules are concatenated to learn video-level features from snippet-level ones end-to-end. Experimental results on UCF-101 and HMDB-51 datasets show that our model achieves state-of-the-art results among LSTM-based methods, while obtaining comparable performance with other state-of-the-art methods (which use not directly comparable schema). Our code will be released.},\n url_paper = {https://arxiv.org/pdf/1811.07059.pdf} \n}\n\n","author_short":["Chen, Z.","Ramachandra, B.","Wu, T.","Vatsavai, R. R."],"key":"RelationalLSTM","id":"RelationalLSTM","bibbaseid":"chen-ramachandra-wu-vatsavai-relationallongshorttermmemoryforvideoactionrecognition-2018","role":"author","urls":{" paper":"https://arxiv.org/pdf/1811.07059.pdf"},"downloads":0,"html":""},"search_terms":["relational","long","short","term","memory","video","action","recognition","chen","ramachandra","wu","vatsavai"],"keywords":[],"authorIDs":[],"dataSources":["MMF3y5eBrtyhnDQun"]}