End-to-End Joint Semantic Segmentation of Actors and Actions in Video

End-to-End Joint Semantic Segmentation of Actors and Actions in Video. Ji, J., Buch, S., Niebles, J., & Soto, A. In ECCV, 2018.

Paper abstract bibtex 2 downloads

Traditional video understanding tasks include human action recognition and actor-object semantic segmentation. However, the joint task of providing semantic segmentation for different actor classes simultaneously with their action class remains a challenging but necessary task for many applications. In this work, we propose a new end-to-end architecture for tackling this joint task in videos. Our model effectively leverages multiple input modalities, contextual information, and joint multitask learning in the video to directly output semantic segmentations in a single unified framework. We train and benchmark our model on the large-scale Actor-Action Dataset (A2D) for joint actor-action semantic segmentation, and demonstrate state-of-the-art performance for both segmentation and detection. We also perform experiments verifying our joint approach improves performance for zero-shot understanding, indicating generalizability of our jointly learned feature space.

@InProceedings{	  jingwei:etal:2018,
  author	= {J. Ji and S. Buch and JC. Niebles and A. Soto},
  title		= {End-to-End Joint Semantic Segmentation of Actors and
		  Actions in Video},
  booktitle	= {{ECCV}},
  year		= {2018},
  abstract	= {Traditional video understanding tasks include human action
		  recognition and actor-object semantic segmentation.
		  However, the joint task of providing semantic segmentation
		  for different actor classes simultaneously with their
		  action class remains a challenging but necessary task for
		  many applications. In this work, we propose a new
		  end-to-end architecture for tackling this joint task in
		  videos. Our model effectively leverages multiple input
		  modalities, contextual information, and joint multitask
		  learning in the video to directly output semantic
		  segmentations in a single unified framework. We train and
		  benchmark our model on the large-scale Actor-Action Dataset
		  (A2D) for joint actor-action semantic segmentation, and
		  demonstrate state-of-the-art performance for both
		  segmentation and detection. We also perform experiments
		  verifying our joint approach improves performance for
		  zero-shot understanding, indicating generalizability of our
		  jointly learned feature space.},
  url		= {http://svl.stanford.edu/assets/papers/ji2018eccv.pdf}
}

Downloads: 2

{"_id":"8mf2DZtktm93Yu47A","bibbaseid":"ji-buch-niebles-soto-endtoendjointsemanticsegmentationofactorsandactionsinvideo-2018","downloads":2,"creationDate":"2018-06-26T17:54:53.372Z","title":"End-to-End Joint Semantic Segmentation of Actors and Actions in Video","author_short":["Ji, J.","Buch, S.","Niebles, J.","Soto, A."],"year":2018,"bibtype":"inproceedings","biburl":"https://raw.githubusercontent.com/ialab-puc/ialab.ing.puc.cl/master/pubs.bib","bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["J."],"propositions":[],"lastnames":["Ji"],"suffixes":[]},{"firstnames":["S."],"propositions":[],"lastnames":["Buch"],"suffixes":[]},{"firstnames":["JC."],"propositions":[],"lastnames":["Niebles"],"suffixes":[]},{"firstnames":["A."],"propositions":[],"lastnames":["Soto"],"suffixes":[]}],"title":"End-to-End Joint Semantic Segmentation of Actors and Actions in Video","booktitle":"ECCV","year":"2018","abstract":"Traditional video understanding tasks include human action recognition and actor-object semantic segmentation. However, the joint task of providing semantic segmentation for different actor classes simultaneously with their action class remains a challenging but necessary task for many applications. In this work, we propose a new end-to-end architecture for tackling this joint task in videos. Our model effectively leverages multiple input modalities, contextual information, and joint multitask learning in the video to directly output semantic segmentations in a single unified framework. We train and benchmark our model on the large-scale Actor-Action Dataset (A2D) for joint actor-action semantic segmentation, and demonstrate state-of-the-art performance for both segmentation and detection. We also perform experiments verifying our joint approach improves performance for zero-shot understanding, indicating generalizability of our jointly learned feature space.","url":"http://svl.stanford.edu/assets/papers/ji2018eccv.pdf","bibtex":"@InProceedings{\t jingwei:etal:2018,\n author\t= {J. Ji and S. Buch and JC. Niebles and A. Soto},\n title\t\t= {End-to-End Joint Semantic Segmentation of Actors and\n\t\t Actions in Video},\n booktitle\t= {{ECCV}},\n year\t\t= {2018},\n abstract\t= {Traditional video understanding tasks include human action\n\t\t recognition and actor-object semantic segmentation.\n\t\t However, the joint task of providing semantic segmentation\n\t\t for different actor classes simultaneously with their\n\t\t action class remains a challenging but necessary task for\n\t\t many applications. In this work, we propose a new\n\t\t end-to-end architecture for tackling this joint task in\n\t\t videos. Our model effectively leverages multiple input\n\t\t modalities, contextual information, and joint multitask\n\t\t learning in the video to directly output semantic\n\t\t segmentations in a single unified framework. We train and\n\t\t benchmark our model on the large-scale Actor-Action Dataset\n\t\t (A2D) for joint actor-action semantic segmentation, and\n\t\t demonstrate state-of-the-art performance for both\n\t\t segmentation and detection. We also perform experiments\n\t\t verifying our joint approach improves performance for\n\t\t zero-shot understanding, indicating generalizability of our\n\t\t jointly learned feature space.},\n url\t\t= {http://svl.stanford.edu/assets/papers/ji2018eccv.pdf}\n}\n\n","author_short":["Ji, J.","Buch, S.","Niebles, J.","Soto, A."],"key":"jingwei:etal:2018","id":"jingwei:etal:2018","bibbaseid":"ji-buch-niebles-soto-endtoendjointsemanticsegmentationofactorsandactionsinvideo-2018","role":"author","urls":{"Paper":"http://svl.stanford.edu/assets/papers/ji2018eccv.pdf"},"metadata":{"authorlinks":{"soto, a":"https://asoto.ing.puc.cl/publications/"}},"downloads":2},"search_terms":["end","end","joint","semantic","segmentation","actors","actions","video","ji","buch","niebles","soto"],"keywords":[],"authorIDs":["jAtuJBcGhng4Lq2Nd"],"dataSources":["sg6yZ29Z2xB5xP79R","sj4fjnZAPkEeYdZqL","m8qFBfFbjk9qWjcmJ","QjT2DEZoWmQYxjHXS"]}