Automatically captioning images using deep learning, datasets and estimation parameters: A Review

Open Access

Issue		EPJ Web Conf. Volume 341, 2025 2^nd International Conference on Advent Trends in Computational Intelligence and Communication Technologies (ICATCICT 2025)


Article Number		01011
Number of page(s)		10
DOI		https://doi.org/10.1051/epjconf/202534101011
Published online		20 November 2025

Amirian, Soheyla et. al. "Automatic Image and Video Caption Generation With Deep Learning: A Concise Review and Algorithmic Overlap." IEEE Access 8 (2020): 218386-218400. [Google Scholar]
Yiheng Wang, Bo Xiao, Ahmed Bouferguene, Mohamed Al-Hussein, Heng Li, Vision-based method for semantic information extraction in construction by integrating deep learning object detection and image captioning, Advanced Engineering Informatics, Volume 53, 2022, 101699, ISSN 1474-0346, https://doi.org/10.1016/j.aei.2022.101699. [Google Scholar]
(https://www.sciencedirect.com/science/article/pii/S1474034622001586) [Google Scholar]
Suresh, K.R., Jarapala, A. & Sudeep, P.V. Image Captioning Encoder-Decoder Models Using CNN-RNN Architectures: A Comparative Study. Circuits Syst Signal Process 41, 5719-5742 (2022). https://doi.org/10.1007/s00034-022-02050-2 [Google Scholar]
Al-Malla, M.A., Jafar, A. & Ghneim, N. Image captioning model using attention and object features to mimic human image understanding. J Big Data 9, 20 (2022). https://doi.org/10.1186/s40537-022-00571-w [Google Scholar]
Gan, SW., Yin, YF., Jiang, ZW. et. al. Vision-Based Sign Language Translation via a Skeleton-Aware Neural Network. J. Comput. Sci. Technol. 40, 378-396 (2025). https://doi.org/10.1007/s11390-024-2978-y [Google Scholar]
Santosh Kumar Mishra, Mahesh Babu Peethala, Sriparna Saha, and Pushpak Bhattacharyya. 2021. An Information Multiplexed Encoder-Decoder Network for Image Captioning in Hindi. In 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE Press, 3019-3024. https://doi.org/10.1109/SMC52423.2021.9658859 [Google Scholar]
Mishra, S. K., Chakraborty, S., Saha, S., & Bhattacharyya, P. (2023). Gagpt-2: A geometric attention-based gpt-2 framework for image captioning in hindi. ACM Transactions on Asian and Low-Resource Language Information Processing, 22 (10), 1-16. doi:10.11591/ijece.v15i3.pp3257-3266 [Google Scholar]
"A Comprehensive Survey on Automatic Image Captioning-Deep Learning Techniques, Datasets and Evaluation Parameters." International Journal of Electrical and Computer Engineering (IJECE), 2025. doi: 10.11591/IJECE.V15I3. PP3257-3266 [Google Scholar]
Alok Singh, Thoudam Doren Singh, and Sivaji Bandyopadhyay. 2021. An encoder-decoder based framework for hindi image caption generation. Multimedia Tools Appl. 80, 28-29 (Nov 2021), 35721-35740. https://doi.org/10.1007/s11042-021-11106-5 [Google Scholar]
Sasibhooshan, R., Kumaraswamy, S. & Sasidharan, S. Image caption generation using Visual Attention Prediction and Contextual Spatial Relation Extraction. J Big Data 10, 18 (2023). https://doi.org/10.1186/s40537-023-00693-9 [Google Scholar]
Al-Malla, Muhammad Abdelhadie, Assef Jafar and Nada Ghneim. "Image captioning model using attention and object features to mimic human image understanding." Journal of Big Data 9 (2022): n. pag. https://iournalofbigdata.springeropen.com/articles/10.1186/s40537-022-00571-w [Google Scholar]
K. Wang, X. Zhang, F. Wang, T.-Y. Wu, and C.-M. Chen, "Multilayer dense attention model for image caption," IEEE Access, vol. 7, pp. 66358-66368, 2019, doi: 10.1109/ACCESS.2019.2917 [Google Scholar]
R. Dhir, S. K. Mishra, S. Saha, and P. Bhattacharyya, "A deep attention-based framework for image caption generation in Hindi language," Computation y Sistemas, vol. 23, no. 3, Oct. 2019, doi: 10.13053/cys-23-3-3269 [Google Scholar]
Singh, T. D. Singh, and S. Bandyopadhyay, "An Encoder-Decoder based framework for Hindi image caption generation," Multimedia Tools and Applications, vol. 80, no. 28-29, pp. 35721-35740, Nov. 2021, doi: 10.1007/s11042-021-11106-5. [Google Scholar]
S. K. Mishra, S. Chakraborty, S. Saha, and P. Bhattacharyya, "GAGPT-2: a geometric attention-based GPT-2 framework for image captioning in Hindi," ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 22, no. 10, pp. 1-16, Oct. 2023, doi: 10.1145/3622936 [Google Scholar]
Asadi, Ahmad and Reza Safabakhsh, 2019. "A deep decoder structure based on wordembedding regression for an encoder-decoder based model for image captioning",. https://doi.org/10.48550/arxiv.1906.12188 [Google Scholar]
Elbedwehy, Samar, T. Medhat, Taher Hamza, and Mohammed F. Alrahmawy, 2023. "Enhanced image captioning using features concatenation and efficient pre-trained word embedding", computer Systems Science and Engineering (3), 46:3637-3652. https://doi.org/10.32604/csse.2023.038376 [Google Scholar]
Kim, Yong-Il, Yerin Hwang, Hyeongu Yun, Seunghyun Yoon, Trung Bui, and Kyomin Jung, 2023. "Pr-mcs: perturbation robust metric for multilingual image captioning",:12237-12258. https://doi.org/10.18653/v1/2023.findings-emnlp.819 [Google Scholar]
Ding, X. X. and Hongchao Fan, 2019. "exploring the distribution patterns of flickr photos", ISPRS International journal of geo-information (9), 8:418. https://doi.org/10.3390/ijgi8090418 [Google Scholar]
Shadi Alijani, Jafar Tanha, and Leyli Mohammadkhanli. 2022. An ensemble of deep learning algorithms for popularity prediction of flickr images. Multimedia Tools Appl. 81, 3 (Jan 2022), 3253-3274. https://doi.org/10.1007/s11042-021-11517-4 [Google Scholar]
Korade, N. B., Salunke, M. B., Bhosle, A. A., Asalkar, G. G., Lal, B., & Kumbharkar, P. B. (2025). Elevating intelligent voice assistant chatbots with natural language processing, and OpenAI technologies. Indonesian Journal of Electrical Engineering and Computer Science, 37(1), 507517. doi:10.11591/ijeecs.v37.i1. pp507-517. [Google Scholar]

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.