Bangla speech recognition using 1D CNN and LSTM with different dimension reduction techniques

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2019.

Bibliografski detalji
Glavni autori: Nirjhor, S M Mahsanul Islam, Chowdhury, Mohammad Abidur Rahman, Sabab, Md. Nazmus
Daljnji autori: Uddin, Jia
Format: Disertacija
Jezik:English
Izdano: Brac University 2019
Teme:
Online pristup:http://hdl.handle.net/10361/12774
id 10361-12774
record_format dspace
spelling 10361-127742022-01-26T10:18:16Z Bangla speech recognition using 1D CNN and LSTM with different dimension reduction techniques Nirjhor, S M Mahsanul Islam Chowdhury, Mohammad Abidur Rahman Sabab, Md. Nazmus Uddin, Jia Department of Computer Science and Engineering, Brac University MFCC PCA Kernel PCA t-SNE 1D CNN RNN LSTM Automatic speech recognition Machine learning This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2019. Cataloged from PDF version of thesis. Includes bibliographical references (pages 39-43). In the area of machine learning, speech recognition was always a hot topic but as world's 8th most widely spoken language Bangla hasn't got the focus as much as she deserved. This research will be on speech recognition using Bangla language dataset. The training model to recognize consists of 1 dimensional Convolutional Neural Network (CNN) and Long-Short Term Memory (LSTM). For feature extraction Mel-frequency Cepstral Coe cient (MFCC) and Mel Spectrogram has been used as the key features for the recognition task. MFCC alone gave an accuracy of 98% for 1d CNN. MFCC when used with LSTM gave an accuracy of 82.35%. Next dimensionality reduction technique was implemented Principal Component Analysis (PCA), Kernel-PCA (k-PCA) and T-distributed Stochastic Neighbor Embedding (t- SNE) transformation on MFCC and Mel Spectrogram for dimensionality reduction technique in a hope to obtain better as e ciency as possible. This is the rst attempt to implement these feature reduction methods on Bengali speech. Dimensionality reduction is a technique that is used to reduce large number of features into fewer factors which holds several advantages like reducing time and required storage space. After transformation using PCA a high consistent accuracy was obtained compared to k-PCA and t-SNE transformation (lowest in t-SNE). With PCA applied on MFCC coe cient the accuracy obtained was 94.54% for 1D CNN and 82.35% for LSTM. With t-SNE the accuracy obtained was 49% with 1D CNN and 50% with LSTM. We have also computed the Mel Spectrogram of the audio data after feeding it to model we obtain an accuracy of 90.74% for 1D CNN and 91.6% for LSTM. With k-PCA applied on Mel Spectrogram coe cient the accuracy obtained was 73.95% for 1D CNN and 72.27% for LSTM. Mohammad Abidur Rahman Chowdhury S M Mahsanul Islam Nirjhor Md. Nazmus Sabab B. Computer Science 2019-10-02T04:52:06Z 2019-10-02T04:52:06Z 2019 2019-08 Thesis ID 14201031 ID 15201049 ID 16101135 http://hdl.handle.net/10361/12774 en Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 43 pages application/pdf Brac University
institution Brac University
collection Institutional Repository
language English
topic MFCC
PCA
Kernel PCA
t-SNE
1D CNN
RNN
LSTM
Automatic speech recognition
Machine learning
spellingShingle MFCC
PCA
Kernel PCA
t-SNE
1D CNN
RNN
LSTM
Automatic speech recognition
Machine learning
Nirjhor, S M Mahsanul Islam
Chowdhury, Mohammad Abidur Rahman
Sabab, Md. Nazmus
Bangla speech recognition using 1D CNN and LSTM with different dimension reduction techniques
description This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2019.
author2 Uddin, Jia
author_facet Uddin, Jia
Nirjhor, S M Mahsanul Islam
Chowdhury, Mohammad Abidur Rahman
Sabab, Md. Nazmus
format Thesis
author Nirjhor, S M Mahsanul Islam
Chowdhury, Mohammad Abidur Rahman
Sabab, Md. Nazmus
author_sort Nirjhor, S M Mahsanul Islam
title Bangla speech recognition using 1D CNN and LSTM with different dimension reduction techniques
title_short Bangla speech recognition using 1D CNN and LSTM with different dimension reduction techniques
title_full Bangla speech recognition using 1D CNN and LSTM with different dimension reduction techniques
title_fullStr Bangla speech recognition using 1D CNN and LSTM with different dimension reduction techniques
title_full_unstemmed Bangla speech recognition using 1D CNN and LSTM with different dimension reduction techniques
title_sort bangla speech recognition using 1d cnn and lstm with different dimension reduction techniques
publisher Brac University
publishDate 2019
url http://hdl.handle.net/10361/12774
work_keys_str_mv AT nirjhorsmmahsanulislam banglaspeechrecognitionusing1dcnnandlstmwithdifferentdimensionreductiontechniques
AT chowdhurymohammadabidurrahman banglaspeechrecognitionusing1dcnnandlstmwithdifferentdimensionreductiontechniques
AT sababmdnazmus banglaspeechrecognitionusing1dcnnandlstmwithdifferentdimensionreductiontechniques
_version_ 1814308747914248192