Learning Local Image Descriptors with Deep Siamese and Triplet Convolutional Networks by Minimizing Global Loss Functions Supplemental Material 1. Effect of Outliers on the Gradient Magnitude. G, V., K., B., Carneiro, G., Reid, I., Kumar, V., Carneiro, G., & Reid, I.
Learning Local Image Descriptors with Deep Siamese and Triplet Convolutional Networks by Minimizing Global Loss Functions Supplemental Material 1. Effect of Outliers on the Gradient Magnitude [pdf]Paper  Learning Local Image Descriptors with Deep Siamese and Triplet Convolutional Networks by Minimizing Global Loss Functions Supplemental Material 1. Effect of Outliers on the Gradient Magnitude [pdf]Website  abstract   bibtex   
In this section, we use the toy problem presented in Section 4 to demonstrate how the gradient magnitude (used for weight update during training) is affected by outliers for: (a) the triplet loss J t 1 (.) in (5), (b) the global loss J g 1 (.) in (6). In this study, we form a mini batch of 20 triplets, where six of them contain an outlier and the remaining 14 triplets do not contain any outlier. We plot the gradient magnitudes of the loss produced by the 20 triplets after 3 training epochs of triplet loss J t 1 (.) of (5) (i.e., ∂J t 1 /∂f (x i) + ∂J t 1 /∂f (x + i) + ∂J t 1 /∂f (x − i)) in Fig. 1-(a); and the global loss J g 1 (.) of (6) (i.e., ∂J g 1 /∂f (x i) + ∂J g 1 /∂f (x + i) + ∂J g 1 /∂f (x − i)) in Fig. 1-(b). Please note that the plots show the normalised magnitude of the gradients (i.e., each triplet gradient magnitude is normalised by the sum of the 20 triplets). The red and green stems indicate the gradient magnitudes for triplets with and without outliers, respectively. 0 5 10 15 20 0 0.05 0.1 0.15 0.2 0.25 Sum of magnitude of gradients for triplet loss triplets without outliers triplets with outliers 0 5 10 15 20 0 0.05 0.1 0.15 0.2 0.25 Sum of magnitude of gradients for global loss triplets without outliers triplets with outliers a) Triplet loss b) Global loss Figure 1. Gradient magnitude after 3 training epochs for a) the triplet loss, b) the global loss. As discussed in Section 4, the gradients ∂J t 1 /∂f (x i), ∂J t 1 /∂f (x + i), ∂J t 1 /∂f (x − i) of the triplet loss in (5) depends only on the i th triplet of the training set. After just a few training epochs, most of the triplets without outliers satisfy the condition in (5) and results in zero magnitude gradient, as indicated by the green stems in Fig. 1-(a); whereas the triplets containing outliers produce high magnitude gradients, as shown by the red stems in Fig. 1-(a). Since all non-zero magnitude gradients in Fig. 1-(a) are generated by triplets with outliers (spurious gradients), the weights of the network are affected only by these outliers after a small number of training epochs, as shown in (Fig 3-(b) of the paper. In the case of the global loss, the gradient ∂J g 1 /∂f (x i), ∂J g 1 /∂f (x + i), ∂J g 1 /∂f (x − i) is parameterised by µ + and µ − , which means that it depends on the global statistics of the training set. This makes the global loss less sensitive to outliers as shown in Fig. 1-(b), where only ≈ 30% (as opposed to 100% in the case of triplet loss) of the gradient magnitude is generated by triplets containing outliers, indicated by red stems in Fig. 1-(b). Thus the global loss function is more robust to outliers than the triplet loss.

Downloads: 0