Multi-Task Learning with Knowledge Distillation for Dense Prediction. Xu, Y., Yang, Y., & Zhang, L. 2023. Paper abstract bibtex While multi-task learning (MTL) has become an attractive topic, its training usually poses more difficulties than the single-task case. How to successfully apply knowledge distillation into MTL to improve training efficiency and model performance is still a challenging problem. In this paper, we introduce a new knowledge distillation procedure with an alternative match for MTL of dense prediction based on two simple design principles. First, for memory and training efficiency, we use a single strong multi-task model as a teacher during training instead of multiple teachers, as widely adopted in existing studies. Second, we employ a less sensitive Cauchy-Schwarz (CS) divergence instead of the Kullback-Leibler (KL) divergence and propose a CS distillation loss accordingly. With the less sensitive divergence, our knowledge distillation with an alternative match is applied for capturing inter-task and intra-task information between the teacher model and the student model of each task, thereby learning more "dark knowl-edge" for effective distillation. We conducted extensive experiments on dense prediction datasets, including NYUD-v2 and PASCAL-Context, for multiple vision tasks, such as semantic segmentation, human parts segmentation, depth estimation , surface normal estimation, and boundary detection. The results show that our proposed method decidedly improves model performance and the practical inference efficiency .
@misc{
title = {Multi-Task Learning with Knowledge Distillation for Dense Prediction},
type = {misc},
year = {2023},
pages = {21550-21559},
id = {ae105343-eeb4-308c-999b-a8324a26099f},
created = {2023-12-13T07:45:14.990Z},
accessed = {2023-12-13},
file_attached = {true},
profile_id = {f1f70cad-e32d-3de2-a3c0-be1736cb88be},
group_id = {5ec9cc91-a5d6-3de5-82f3-3ef3d98a89c1},
last_modified = {2023-12-13T07:45:17.960Z},
read = {false},
starred = {false},
authored = {false},
confirmed = {false},
hidden = {false},
folder_uuids = {d25a2be2-b54f-400b-918b-b254e8044e39},
private_publication = {false},
abstract = {While multi-task learning (MTL) has become an attractive topic, its training usually poses more difficulties than the single-task case. How to successfully apply knowledge distillation into MTL to improve training efficiency and model performance is still a challenging problem. In this paper, we introduce a new knowledge distillation procedure with an alternative match for MTL of dense prediction based on two simple design principles. First, for memory and training efficiency, we use a single strong multi-task model as a teacher during training instead of multiple teachers, as widely adopted in existing studies. Second, we employ a less sensitive Cauchy-Schwarz (CS) divergence instead of the Kullback-Leibler (KL) divergence and propose a CS distillation loss accordingly. With the less sensitive divergence, our knowledge distillation with an alternative match is applied for capturing inter-task and intra-task information between the teacher model and the student model of each task, thereby learning more "dark knowl-edge" for effective distillation. We conducted extensive experiments on dense prediction datasets, including NYUD-v2 and PASCAL-Context, for multiple vision tasks, such as semantic segmentation, human parts segmentation, depth estimation , surface normal estimation, and boundary detection. The results show that our proposed method decidedly improves model performance and the practical inference efficiency .},
bibtype = {misc},
author = {Xu, Yangyang and Yang, Yibo and Zhang, Lefei}
}
Downloads: 0
{"_id":"GmyQaLD5MDBY96k3Q","bibbaseid":"xu-yang-zhang-multitasklearningwithknowledgedistillationfordenseprediction-2023","author_short":["Xu, Y.","Yang, Y.","Zhang, L."],"bibdata":{"title":"Multi-Task Learning with Knowledge Distillation for Dense Prediction","type":"misc","year":"2023","pages":"21550-21559","id":"ae105343-eeb4-308c-999b-a8324a26099f","created":"2023-12-13T07:45:14.990Z","accessed":"2023-12-13","file_attached":"true","profile_id":"f1f70cad-e32d-3de2-a3c0-be1736cb88be","group_id":"5ec9cc91-a5d6-3de5-82f3-3ef3d98a89c1","last_modified":"2023-12-13T07:45:17.960Z","read":false,"starred":false,"authored":false,"confirmed":false,"hidden":false,"folder_uuids":"d25a2be2-b54f-400b-918b-b254e8044e39","private_publication":false,"abstract":"While multi-task learning (MTL) has become an attractive topic, its training usually poses more difficulties than the single-task case. How to successfully apply knowledge distillation into MTL to improve training efficiency and model performance is still a challenging problem. In this paper, we introduce a new knowledge distillation procedure with an alternative match for MTL of dense prediction based on two simple design principles. First, for memory and training efficiency, we use a single strong multi-task model as a teacher during training instead of multiple teachers, as widely adopted in existing studies. Second, we employ a less sensitive Cauchy-Schwarz (CS) divergence instead of the Kullback-Leibler (KL) divergence and propose a CS distillation loss accordingly. With the less sensitive divergence, our knowledge distillation with an alternative match is applied for capturing inter-task and intra-task information between the teacher model and the student model of each task, thereby learning more \"dark knowl-edge\" for effective distillation. We conducted extensive experiments on dense prediction datasets, including NYUD-v2 and PASCAL-Context, for multiple vision tasks, such as semantic segmentation, human parts segmentation, depth estimation , surface normal estimation, and boundary detection. The results show that our proposed method decidedly improves model performance and the practical inference efficiency .","bibtype":"misc","author":"Xu, Yangyang and Yang, Yibo and Zhang, Lefei","bibtex":"@misc{\n title = {Multi-Task Learning with Knowledge Distillation for Dense Prediction},\n type = {misc},\n year = {2023},\n pages = {21550-21559},\n id = {ae105343-eeb4-308c-999b-a8324a26099f},\n created = {2023-12-13T07:45:14.990Z},\n accessed = {2023-12-13},\n file_attached = {true},\n profile_id = {f1f70cad-e32d-3de2-a3c0-be1736cb88be},\n group_id = {5ec9cc91-a5d6-3de5-82f3-3ef3d98a89c1},\n last_modified = {2023-12-13T07:45:17.960Z},\n read = {false},\n starred = {false},\n authored = {false},\n confirmed = {false},\n hidden = {false},\n folder_uuids = {d25a2be2-b54f-400b-918b-b254e8044e39},\n private_publication = {false},\n abstract = {While multi-task learning (MTL) has become an attractive topic, its training usually poses more difficulties than the single-task case. How to successfully apply knowledge distillation into MTL to improve training efficiency and model performance is still a challenging problem. In this paper, we introduce a new knowledge distillation procedure with an alternative match for MTL of dense prediction based on two simple design principles. First, for memory and training efficiency, we use a single strong multi-task model as a teacher during training instead of multiple teachers, as widely adopted in existing studies. Second, we employ a less sensitive Cauchy-Schwarz (CS) divergence instead of the Kullback-Leibler (KL) divergence and propose a CS distillation loss accordingly. With the less sensitive divergence, our knowledge distillation with an alternative match is applied for capturing inter-task and intra-task information between the teacher model and the student model of each task, thereby learning more \"dark knowl-edge\" for effective distillation. We conducted extensive experiments on dense prediction datasets, including NYUD-v2 and PASCAL-Context, for multiple vision tasks, such as semantic segmentation, human parts segmentation, depth estimation , surface normal estimation, and boundary detection. The results show that our proposed method decidedly improves model performance and the practical inference efficiency .},\n bibtype = {misc},\n author = {Xu, Yangyang and Yang, Yibo and Zhang, Lefei}\n}","author_short":["Xu, Y.","Yang, Y.","Zhang, L."],"urls":{"Paper":"https://bibbase.org/service/mendeley/bfbbf840-4c42-3914-a463-19024f50b30c/file/70dd3aee-df55-285b-6ac8-d862bb1bf2d2/full_text.pdf.pdf"},"biburl":"https://bibbase.org/service/mendeley/bfbbf840-4c42-3914-a463-19024f50b30c","bibbaseid":"xu-yang-zhang-multitasklearningwithknowledgedistillationfordenseprediction-2023","role":"author","metadata":{"authorlinks":{}},"downloads":0},"bibtype":"misc","biburl":"https://bibbase.org/service/mendeley/bfbbf840-4c42-3914-a463-19024f50b30c","dataSources":["2252seNhipfTmjEBQ"],"keywords":[],"search_terms":["multi","task","learning","knowledge","distillation","dense","prediction","xu","yang","zhang"],"title":"Multi-Task Learning with Knowledge Distillation for Dense Prediction","year":2023}