Hyperparameter Tuning Over an Attention Model for Image Captioning

Hyperparameter Tuning Over an Attention Model for Image Captioning. Castro, R., Pineda, I., & Morocho-Cayamcela, M., E. In Salgado Guerrero, J., P., Chicaiza Espinosa, J., Cerrada Lozada, M., & Berrezueta-Guzman, S., editors, Information and Communication Technologies. TICEC 2021. Communications in Computer and Information Science, volume 1456, pages 172-183, 2021. Springer International Publishing.

Paper

Hyperparameter Tuning Over an Attention Model for Image Captioning [link]

Website doi abstract bibtex

Considering the historical trajectory and evolution of image captioning as a research area, this paper focuses on visual attention as an approach to solve captioning tasks with computer vision. This article studies the efficiency of different hyperparameter configurations on a state-of-the-art visual attention architecture composed of a pre-trained residual neural network encoder, and a long short-term memory decoder. Results show that the selection of both the cost function and the gradient-based optimizer have a significant impact on the captioning results. Our system considers the cross-entropy, Kullback-Leibler divergence, mean squared error, and the negative log-likelihood loss functions, as well as the adaptive momentum, AdamW, RMSprop, stochastic gradient descent, and Adadelta optimizers. Based on the performance metrics, a combination of cross-entropy with Adam is identified as the best alternative returning a Top-5 accuracy value of 73.092, and a BLEU-4 value of 0.201. Setting the cross-entropy as an independent variable, the first two optimization alternatives prove the best performance with a BLEU-4 metric value of 0.201. In terms of the inference loss, Adam outperforms AdamW with 3.413 over 3.418 and a Top-5 accuracy of 73.092 over 72.989.

@inproceedings{
 title = {Hyperparameter Tuning Over an Attention Model for Image Captioning},
 type = {inproceedings},
 year = {2021},
 pages = {172-183},
 volume = {1456},
 websites = {https://link.springer.com/chapter/10.1007/978-3-030-89941-7_13},
 publisher = {Springer International Publishing},
 city = {Guayaquil},
 id = {f448a861-8787-3015-b75f-72a04fd20d88},
 created = {2021-12-04T20:47:50.553Z},
 file_attached = {true},
 profile_id = {ab671682-19b4-3dc7-aa8e-f674831dab33},
 group_id = {14dc25e5-4c22-3a10-8988-092d628a7743},
 last_modified = {2025-02-01T21:26:22.383Z},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {true},
 hidden = {false},
 private_publication = {false},
 abstract = {Considering the historical trajectory and evolution of image captioning as a research area, this paper focuses on visual attention as an approach to solve captioning tasks with computer vision. This article studies the efficiency of different hyperparameter configurations on a state-of-the-art visual attention architecture composed of a pre-trained residual neural network encoder, and a long short-term memory decoder. Results show that the selection of both the cost function and the gradient-based optimizer have a significant impact on the captioning results. Our system considers the cross-entropy, Kullback-Leibler divergence, mean squared error, and the negative log-likelihood loss functions, as well as the adaptive momentum, AdamW, RMSprop, stochastic gradient descent, and Adadelta optimizers. Based on the performance metrics, a combination of cross-entropy with Adam is identified as the best alternative returning a Top-5 accuracy value of 73.092, and a BLEU-4 value of 0.201. Setting the cross-entropy as an independent variable, the first two optimization alternatives prove the best performance with a BLEU-4 metric value of 0.201. In terms of the inference loss, Adam outperforms AdamW with 3.413 over 3.418 and a Top-5 accuracy of 73.092 over 72.989.},
 bibtype = {inproceedings},
 author = {Castro, Roberto and Pineda, Israel and Morocho-Cayamcela, Manuel Eugenio},
 editor = {Salgado Guerrero, Juan Pablo and Chicaiza Espinosa, Janneth and Cerrada Lozada, Mariela and Berrezueta-Guzman, Santiago},
 doi = {https://doi.org/10.1007/978-3-030-89941-7_13},
 booktitle = {Information and Communication Technologies. TICEC 2021. Communications in Computer and Information Science}
}

Downloads: 0

{"_id":"R3Tc6KTxrcdJeG33W","bibbaseid":"castro-pineda-morochocayamcela-hyperparametertuningoveranattentionmodelforimagecaptioning-2021","author_short":["Castro, R.","Pineda, I.","Morocho-Cayamcela, M., E."],"bibdata":{"title":"Hyperparameter Tuning Over an Attention Model for Image Captioning","type":"inproceedings","year":"2021","pages":"172-183","volume":"1456","websites":"https://link.springer.com/chapter/10.1007/978-3-030-89941-7_13","publisher":"Springer International Publishing","city":"Guayaquil","id":"f448a861-8787-3015-b75f-72a04fd20d88","created":"2021-12-04T20:47:50.553Z","file_attached":"true","profile_id":"ab671682-19b4-3dc7-aa8e-f674831dab33","group_id":"14dc25e5-4c22-3a10-8988-092d628a7743","last_modified":"2025-02-01T21:26:22.383Z","read":false,"starred":false,"authored":false,"confirmed":"true","hidden":false,"private_publication":false,"abstract":"Considering the historical trajectory and evolution of image captioning as a research area, this paper focuses on visual attention as an approach to solve captioning tasks with computer vision. This article studies the efficiency of different hyperparameter configurations on a state-of-the-art visual attention architecture composed of a pre-trained residual neural network encoder, and a long short-term memory decoder. Results show that the selection of both the cost function and the gradient-based optimizer have a significant impact on the captioning results. Our system considers the cross-entropy, Kullback-Leibler divergence, mean squared error, and the negative log-likelihood loss functions, as well as the adaptive momentum, AdamW, RMSprop, stochastic gradient descent, and Adadelta optimizers. Based on the performance metrics, a combination of cross-entropy with Adam is identified as the best alternative returning a Top-5 accuracy value of 73.092, and a BLEU-4 value of 0.201. Setting the cross-entropy as an independent variable, the first two optimization alternatives prove the best performance with a BLEU-4 metric value of 0.201. In terms of the inference loss, Adam outperforms AdamW with 3.413 over 3.418 and a Top-5 accuracy of 73.092 over 72.989.","bibtype":"inproceedings","author":"Castro, Roberto and Pineda, Israel and Morocho-Cayamcela, Manuel Eugenio","editor":"Salgado Guerrero, Juan Pablo and Chicaiza Espinosa, Janneth and Cerrada Lozada, Mariela and Berrezueta-Guzman, Santiago","doi":"https://doi.org/10.1007/978-3-030-89941-7_13","booktitle":"Information and Communication Technologies. TICEC 2021. Communications in Computer and Information Science","bibtex":"@inproceedings{\n title = {Hyperparameter Tuning Over an Attention Model for Image Captioning},\n type = {inproceedings},\n year = {2021},\n pages = {172-183},\n volume = {1456},\n websites = {https://link.springer.com/chapter/10.1007/978-3-030-89941-7_13},\n publisher = {Springer International Publishing},\n city = {Guayaquil},\n id = {f448a861-8787-3015-b75f-72a04fd20d88},\n created = {2021-12-04T20:47:50.553Z},\n file_attached = {true},\n profile_id = {ab671682-19b4-3dc7-aa8e-f674831dab33},\n group_id = {14dc25e5-4c22-3a10-8988-092d628a7743},\n last_modified = {2025-02-01T21:26:22.383Z},\n read = {false},\n starred = {false},\n authored = {false},\n confirmed = {true},\n hidden = {false},\n private_publication = {false},\n abstract = {Considering the historical trajectory and evolution of image captioning as a research area, this paper focuses on visual attention as an approach to solve captioning tasks with computer vision. This article studies the efficiency of different hyperparameter configurations on a state-of-the-art visual attention architecture composed of a pre-trained residual neural network encoder, and a long short-term memory decoder. Results show that the selection of both the cost function and the gradient-based optimizer have a significant impact on the captioning results. Our system considers the cross-entropy, Kullback-Leibler divergence, mean squared error, and the negative log-likelihood loss functions, as well as the adaptive momentum, AdamW, RMSprop, stochastic gradient descent, and Adadelta optimizers. Based on the performance metrics, a combination of cross-entropy with Adam is identified as the best alternative returning a Top-5 accuracy value of 73.092, and a BLEU-4 value of 0.201. Setting the cross-entropy as an independent variable, the first two optimization alternatives prove the best performance with a BLEU-4 metric value of 0.201. In terms of the inference loss, Adam outperforms AdamW with 3.413 over 3.418 and a Top-5 accuracy of 73.092 over 72.989.},\n bibtype = {inproceedings},\n author = {Castro, Roberto and Pineda, Israel and Morocho-Cayamcela, Manuel Eugenio},\n editor = {Salgado Guerrero, Juan Pablo and Chicaiza Espinosa, Janneth and Cerrada Lozada, Mariela and Berrezueta-Guzman, Santiago},\n doi = {https://doi.org/10.1007/978-3-030-89941-7_13},\n booktitle = {Information and Communication Technologies. TICEC 2021. Communications in Computer and Information Science}\n}","author_short":["Castro, R.","Pineda, I.","Morocho-Cayamcela, M., E."],"editor_short":["Salgado Guerrero, J., P.","Chicaiza Espinosa, J.","Cerrada Lozada, M.","Berrezueta-Guzman, S."],"urls":{"Paper":"https://bibbase.org/service/mendeley/ab671682-19b4-3dc7-aa8e-f674831dab33/file/1cc9ad23-5fc9-0755-14b5-9e29113c3737/Castro2021HyperparameterCaptioning.pdf.pdf","Website":"https://link.springer.com/chapter/10.1007/978-3-030-89941-7_13"},"biburl":"https://bibbase.org/service/mendeley/ab671682-19b4-3dc7-aa8e-f674831dab33","bibbaseid":"castro-pineda-morochocayamcela-hyperparametertuningoveranattentionmodelforimagecaptioning-2021","role":"author","metadata":{"authorlinks":{}}},"bibtype":"inproceedings","biburl":"https://bibbase.org/service/mendeley/ab671682-19b4-3dc7-aa8e-f674831dab33","dataSources":["hzW3fZF2Mc7E2uPbt","zfELa4ZCGH3PHKJhf","ya2CyA73rpZseyrZ8"],"keywords":[],"search_terms":["hyperparameter","tuning","over","attention","model","image","captioning","castro","pineda","morocho-cayamcela"],"title":"Hyperparameter Tuning Over an Attention Model for Image Captioning","year":2021}