Simultaneous Detection and Segmentation. Hariharan, B., Arbeláez, P. A., Girshick, R. B, & Malik, J. In Fleet, D. J, Pajdla, T., Schiele, B., & Tuytelaars, T., editors, Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VII, pages 297–312, 2014. Springer.
Simultaneous Detection and Segmentation [link]Paper  doi  bibtex   
@inproceedings{Hariharan:2014dh,
author = {Hariharan, Bharath and Arbel{\'a}ez, Pablo Andr{\'e}s and Girshick, Ross B and Malik, Jitendra},
title = {{Simultaneous Detection and Segmentation}},
booktitle = {Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VII},
year = {2014},
editor = {Fleet, David J and Pajdla, Tom{\'a}s and Schiele, Bernt and Tuytelaars, Tinne},
pages = {297--312},
publisher = {Springer},
annote = {RCNN on segmentation.

writing is too bad. maybe just check the code (<https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/shape/sds/>). some terms are poorly defined, such as "region overlap". Also, I really don't understand if the B network has two pathways or one. Based on code, B is single-streamed.

In C, not sure whether bbox-based ground-truth or region-based ground-truth are used.


Well, I think box overlap is box IoU, and region overlap is region IoU. If > 0.5, then positive for that class; otherwise, negative (there will be a background class in the classifier, in case this region or bbox is too far from all ground-truth boxes). When the paper says "predicting region overlap", that means the ground-truth class is based on region IoU. Check files like <https://github.com/bharath272/sds_eccv2014/blob/master/prototxts/piwindow_train.prototxt>, indeed it has 21 class (20 VOC + 1 background).


pp. 303 refinement.

Here I believe the classifer on each of 10x10 cells are different, and they all take same input (features at 10x10 cells, as well as the downsampeld mask), but predict for different grids.

Best way is to look at the code.

Here, superpixel is not defined how to obtain them, but they should be smaller than the region proposals.
},
keywords = {deep learning},
doi = {10.1007/978-3-319-10584-0_20},
read = {Yes},
rating = {2},
date-added = {2017-02-22T04:34:14GMT},
date-modified = {2017-03-02T15:37:47GMT},
url = {http://dx.doi.org/10.1007/978-3-319-10584-0_20},
local-url = {file://localhost/Users/yimengzh/Documents/Papers3_revised/Library.papers3/Articles/2014/Hariharan/ECCV%202014%20Part%20VII%202014%20Hariharan.pdf},
file = {{ECCV 2014 Part VII 2014 Hariharan.pdf:/Users/yimengzh/Documents/Papers3_revised/Library.papers3/Articles/2014/Hariharan/ECCV 2014 Part VII 2014 Hariharan.pdf:application/pdf}},
uri = {\url{papers3://publication/doi/10.1007/978-3-319-10584-0_20}}
}

Downloads: 0