Silent voice: harnessing deep learning for lip-reading in Bangla

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2024.

Бібліографічні деталі
Автори: Shaheen, Munia, Ifti, Akib Zabed, Hassan, Ariful, Hossain, Junaed
Інші автори: Hossain, Muhammad Iqbal
Формат: Дисертація
Мова:English
Опубліковано: Brac University 2024
Предмети:
Онлайн доступ:http://hdl.handle.net/10361/22768
id 10361-22768
record_format dspace
spelling 10361-227682024-05-07T21:03:30Z Silent voice: harnessing deep learning for lip-reading in Bangla Shaheen, Munia Ifti, Akib Zabed Hassan, Ariful Hossain, Junaed Hossain, Muhammad Iqbal Rahman, Rafeed Department of Computer Science and Engineering, Brac University Convolutional neural network (CNN) Recurrent neural network (RNN) Lip feature extraction Lip-reading Deep learning Machine learning Neural networks (Computer science) This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2024. Cataloged from PDF version of thesis. Includes bibliographical references (pages 47-49). Understanding speech just through lip movement is known as lipreading. It is a crucial component of interpersonal interactions. The majority of the previous initiatives attempted to address the English lipreading issue. However, our goal is to build up a deep neural network for the Bangla language that can produce comprehensible speech from silent videos just by capturing the speaker’s lip movements. Despite the fact that there is research on this topic in various languages, Bangla does not currently have a study or a suitable corpus to conduct research. Hence, we created a dataset of 4000 videos where we selected 20 Bangla words and these words were pronounced by 65 different speakers. Then we implemented models based on CNN-RNN architecture. Two models LipNet and autoencoder-decoder were used in previous research and two custom models were implemented as a part of our own experiments. Finally, Lip-Net exhibits a reasonable level of performance with an accuracy of 62%, while Auto Encoder-Decoder performs poorly with an accuracy of 49.65%. Custom Model-1 shows a substantial rise in accuracy with 70.86%, and Custom Conv-LSTM exhibits the best overall performance with a maximum accuracy of 76.24%. Munia Shaheen Akib Zabed Ifti Ariful Hassan Junaed Hossain B.Sc. in Computer Science and Engineering 2024-05-07T09:37:13Z 2024-05-07T09:37:13Z ©2024 2024-01 Thesis ID: 23241102 ID: 23341129 ID: 20301259 ID: 23241107 http://hdl.handle.net/10361/22768 en Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 58 pages application/pdf Brac University
institution Brac University
collection Institutional Repository
language English
topic Convolutional neural network (CNN)
Recurrent neural network (RNN)
Lip feature extraction
Lip-reading
Deep learning
Machine learning
Neural networks (Computer science)
spellingShingle Convolutional neural network (CNN)
Recurrent neural network (RNN)
Lip feature extraction
Lip-reading
Deep learning
Machine learning
Neural networks (Computer science)
Shaheen, Munia
Ifti, Akib Zabed
Hassan, Ariful
Hossain, Junaed
Silent voice: harnessing deep learning for lip-reading in Bangla
description This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2024.
author2 Hossain, Muhammad Iqbal
author_facet Hossain, Muhammad Iqbal
Shaheen, Munia
Ifti, Akib Zabed
Hassan, Ariful
Hossain, Junaed
format Thesis
author Shaheen, Munia
Ifti, Akib Zabed
Hassan, Ariful
Hossain, Junaed
author_sort Shaheen, Munia
title Silent voice: harnessing deep learning for lip-reading in Bangla
title_short Silent voice: harnessing deep learning for lip-reading in Bangla
title_full Silent voice: harnessing deep learning for lip-reading in Bangla
title_fullStr Silent voice: harnessing deep learning for lip-reading in Bangla
title_full_unstemmed Silent voice: harnessing deep learning for lip-reading in Bangla
title_sort silent voice: harnessing deep learning for lip-reading in bangla
publisher Brac University
publishDate 2024
url http://hdl.handle.net/10361/22768
work_keys_str_mv AT shaheenmunia silentvoiceharnessingdeeplearningforlipreadinginbangla
AT iftiakibzabed silentvoiceharnessingdeeplearningforlipreadinginbangla
AT hassanariful silentvoiceharnessingdeeplearningforlipreadinginbangla
AT hossainjunaed silentvoiceharnessingdeeplearningforlipreadinginbangla
_version_ 1814308643775971328