Multi-Task Learning with Knowledge Distillation for Dense Prediction. Xu, Y., Yang, Y., & Zhang, L. 2023.
Multi-Task Learning with Knowledge Distillation for Dense Prediction [pdf]Paper  abstract   bibtex   
While multi-task learning (MTL) has become an attractive topic, its training usually poses more difficulties than the single-task case. How to successfully apply knowledge distillation into MTL to improve training efficiency and model performance is still a challenging problem. In this paper, we introduce a new knowledge distillation procedure with an alternative match for MTL of dense prediction based on two simple design principles. First, for memory and training efficiency, we use a single strong multi-task model as a teacher during training instead of multiple teachers, as widely adopted in existing studies. Second, we employ a less sensitive Cauchy-Schwarz (CS) divergence instead of the Kullback-Leibler (KL) divergence and propose a CS distillation loss accordingly. With the less sensitive divergence, our knowledge distillation with an alternative match is applied for capturing inter-task and intra-task information between the teacher model and the student model of each task, thereby learning more "dark knowl-edge" for effective distillation. We conducted extensive experiments on dense prediction datasets, including NYUD-v2 and PASCAL-Context, for multiple vision tasks, such as semantic segmentation, human parts segmentation, depth estimation , surface normal estimation, and boundary detection. The results show that our proposed method decidedly improves model performance and the practical inference efficiency .

Downloads: 0