Automated image caption generator in Bangla using multimodal learning

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2023.

Bibliographic Details
Main Authors:	Rodoshi, Mashiat Hasin, Ahmed, Moin Uddin, Ashraf, Md. Sobhan, Mim, Md. Galib Hasan, Khanam, Ashfia
Other Authors:	Sadeque, Farig Yousuf
Format:	Thesis
Language:	English
Published:	Brac University 2023
Subjects:	Image captioning CNN LSTM RNN Deep learning Bangla Natural language processing Machine learning Neural networks (Computer science) Natural language processing (Computer science) Cognitive learning theory
Online Access:	http://hdl.handle.net/10361/22012

id	10361-22012
record_format	dspace
spelling	10361-220122023-12-20T21:02:42Z Automated image caption generator in Bangla using multimodal learning Rodoshi, Mashiat Hasin Ahmed, Moin Uddin Ashraf, Md. Sobhan Mim, Md. Galib Hasan Khanam, Ashfia Sadeque, Farig Yousuf Department of Computer Science and Engineering, Brac University Image captioning CNN LSTM RNN Deep learning Bangla Natural language processing Machine learning Neural networks (Computer science) Natural language processing (Computer science) Cognitive learning theory This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2023. Cataloged from PDF version of thesis. Includes bibliographical references (pages 37-38). Experiencing an image on-screen is a privilege that we often seem not to care about. A visually impaired person does not have that luxury. A system that can automatically produce closed captions of an image can thus help visually impaired people experience what’s appearing on a digital screen. Research in this area has been in the forefront of multimodal machine learning for quite some time; but while a plethora of languages has benefited from all that research, Bangla has been left behind. For our thesis, we would like to build a Bangla Caption Generator using multimodal learning with high accuracy which automatically produces closed captioning in Bangla for digital images. The generator will be able to identify different objects in the image, relations among the objects and the actions happening in the image using neural networks. Combining the information collected, it may construct an information-rich, descriptive caption for the image. These captions can be later read aloud so that visually impaired people can get an idea about what is happening around them. This thesis aims to achieve further improvement upon the existing image caption generator in Bangla so that it can greatly help to improve the lives of visually impaired people as well as advance this research towards the state of the art. We have used the Flickr8k and Flickr30k datasets containing 8091 and 31783 images respectively and there are five Bangla captions for each image. We have used the VGG16, VGG19, ResNet50, InceptionV3 and EfficientNetB3 CNN architectures for feature extraction. Our best model has achieved a BLEU-1, BLEU-2, BLEU-3 and BLEU-4 score of 0.553197, 0.341976, 0.234436 and 0.113089 respectively. Mashiat Hasin Rodoshi Moin Uddin Ahmed Md. Sobhan Ashraf Md. Galib Hasan Mim Ashfia Khanam B.Sc. in Computer Science and Engineering 2023-12-20T05:06:39Z 2023-12-20T05:06:39Z 2023 2023-01 Thesis ID 19201089 ID 19301095 ID 19301046 ID 19301094 ID 18301231 http://hdl.handle.net/10361/22012 en Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 38 pages application/pdf Brac University
institution	Brac University
collection	Institutional Repository
language	English
topic	Image captioning CNN LSTM RNN Deep learning Bangla Natural language processing Machine learning Neural networks (Computer science) Natural language processing (Computer science) Cognitive learning theory
spellingShingle	Image captioning CNN LSTM RNN Deep learning Bangla Natural language processing Machine learning Neural networks (Computer science) Natural language processing (Computer science) Cognitive learning theory Rodoshi, Mashiat Hasin Ahmed, Moin Uddin Ashraf, Md. Sobhan Mim, Md. Galib Hasan Khanam, Ashfia Automated image caption generator in Bangla using multimodal learning
description	This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2023.
author2	Sadeque, Farig Yousuf
author_facet	Sadeque, Farig Yousuf Rodoshi, Mashiat Hasin Ahmed, Moin Uddin Ashraf, Md. Sobhan Mim, Md. Galib Hasan Khanam, Ashfia
format	Thesis
author	Rodoshi, Mashiat Hasin Ahmed, Moin Uddin Ashraf, Md. Sobhan Mim, Md. Galib Hasan Khanam, Ashfia
author_sort	Rodoshi, Mashiat Hasin
title	Automated image caption generator in Bangla using multimodal learning
title_short	Automated image caption generator in Bangla using multimodal learning
title_full	Automated image caption generator in Bangla using multimodal learning
title_fullStr	Automated image caption generator in Bangla using multimodal learning
title_full_unstemmed	Automated image caption generator in Bangla using multimodal learning
title_sort	automated image caption generator in bangla using multimodal learning
publisher	Brac University
publishDate	2023
url	http://hdl.handle.net/10361/22012
work_keys_str_mv	AT rodoshimashiathasin automatedimagecaptiongeneratorinbanglausingmultimodallearning AT ahmedmoinuddin automatedimagecaptiongeneratorinbanglausingmultimodallearning AT ashrafmdsobhan automatedimagecaptiongeneratorinbanglausingmultimodallearning AT mimmdgalibhasan automatedimagecaptiongeneratorinbanglausingmultimodallearning AT khanamashfia automatedimagecaptiongeneratorinbanglausingmultimodallearning
_version_	1814307153029103616

Automated image caption generator in Bangla using multimodal learning

Similar Items