Hyperparameter Tuning Over an Attention Model for Image Captioning. Castro, R., Pineda, I., & Morocho-Cayamcela, M., E. In Salgado Guerrero, J., P., Chicaiza Espinosa, J., Cerrada Lozada, M., & Berrezueta-Guzman, S., editors, Information and Communication Technologies. TICEC 2021. Communications in Computer and Information Science, volume 1456, pages 172-183, 2021. Springer International Publishing.
Hyperparameter Tuning Over an Attention Model for Image Captioning [pdf]Paper  Hyperparameter Tuning Over an Attention Model for Image Captioning [link]Website  doi  abstract   bibtex   
Considering the historical trajectory and evolution of image captioning as a research area, this paper focuses on visual attention as an approach to solve captioning tasks with computer vision. This article studies the efficiency of different hyperparameter configurations on a state-of-the-art visual attention architecture composed of a pre-trained residual neural network encoder, and a long short-term memory decoder. Results show that the selection of both the cost function and the gradient-based optimizer have a significant impact on the captioning results. Our system considers the cross-entropy, Kullback-Leibler divergence, mean squared error, and the negative log-likelihood loss functions, as well as the adaptive momentum, AdamW, RMSprop, stochastic gradient descent, and Adadelta optimizers. Based on the performance metrics, a combination of cross-entropy with Adam is identified as the best alternative returning a Top-5 accuracy value of 73.092, and a BLEU-4 value of 0.201. Setting the cross-entropy as an independent variable, the first two optimization alternatives prove the best performance with a BLEU-4 metric value of 0.201. In terms of the inference loss, Adam outperforms AdamW with 3.413 over 3.418 and a Top-5 accuracy of 73.092 over 72.989.

Downloads: 0