Masked-attention Mask Transformer for Universal Image Segmentation. Cheng, B., Misra, I., Schwing, A. G., Kirillov, A., & Girdhar, R. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1280–1289, June, 2022. ISSN: 2575-7075
doi  abstract   bibtex   
Image segmentation groups pixels with different semantics, e.g., category or instance membership. Each choice of semantics defines a task. While only the semantics of each task differ, current research focuses on designing spe-cialized architectures for each task. We present Masked- attention Mask Transformer (Mask2Former), a new archi-tecture capable of addressing any image segmentation task (panoptic, instance or semantic). Its key components in-clude masked attention, which extracts localized features by constraining cross-attention within predicted mask regions. In addition to reducing the research effort by at least three times, it outperforms the best specialized architectures by a significant margin on four popular datasets. Most no-tably, Mask2Former sets a new state-of-the-art for panoptic segmentation (57.8 PQ on COCO), instance segmentation (50.1 AP on COCO) and semantic segmentation (57.7 mIoU onADE20K).
@inproceedings{cheng_masked-attention_2022,
	title = {Masked-attention {Mask} {Transformer} for {Universal} {Image} {Segmentation}},
	doi = {10.1109/CVPR52688.2022.00135},
	abstract = {Image segmentation groups pixels with different semantics, e.g., category or instance membership. Each choice of semantics defines a task. While only the semantics of each task differ, current research focuses on designing spe-cialized architectures for each task. We present Masked- attention Mask Transformer (Mask2Former), a new archi-tecture capable of addressing any image segmentation task (panoptic, instance or semantic). Its key components in-clude masked attention, which extracts localized features by constraining cross-attention within predicted mask regions. In addition to reducing the research effort by at least three times, it outperforms the best specialized architectures by a significant margin on four popular datasets. Most no-tably, Mask2Former sets a new state-of-the-art for panoptic segmentation (57.8 PQ on COCO), instance segmentation (50.1 AP on COCO) and semantic segmentation (57.7 mIoU onADE20K).},
	language = {en},
	booktitle = {2022 {IEEE}/{CVF} {Conference} on {Computer} {Vision} and {Pattern} {Recognition} ({CVPR})},
	author = {Cheng, Bowen and Misra, Ishan and Schwing, Alexander G. and Kirillov, Alexander and Girdhar, Rohit},
	month = jun,
	year = {2022},
	note = {ISSN: 2575-7075},
	keywords = {\#Attention, \#CVPR{\textgreater}22, \#Segmentation, \#Transformer, \#Vision, /unread, Computational modeling, Computer architecture, Feature extraction, Image segmentation, Recognition: detection, Segmentation, Semantics, Shape, Transformers, categorization, grouping and shape analysis, retrieval, ❤️},
	pages = {1280--1289},
}

Downloads: 0