One voice is all you need: a one-shot approach to recognize you

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2021.

Sonraí bibleagrafaíochta
Príomhchruthaitheoirí:	Dipto, Shahriar Rumi, Nowshin, Priata, Ahmed, Intesur, Chowdhury, Deboraj, Noor, Galib Abdun
Rannpháirtithe:	Chakrabarty, Amitabha
Formáid:	Tráchtas
Teanga:	English
Foilsithe / Cruthaithe:	Brac University 2022
Ábhair:	Audio classification Siamese neural network speaker recognition Oneshot learning Triplet loss Multimedia systems Neural networks (Computer science)
Rochtain ar líne:	http://hdl.handle.net/10361/15871

id	10361-15871
record_format	dspace
spelling	10361-158712022-01-26T10:04:54Z One voice is all you need: a one-shot approach to recognize you Dipto, Shahriar Rumi Nowshin, Priata Ahmed, Intesur Chowdhury, Deboraj Noor, Galib Abdun Chakrabarty, Amitabha Department of Computer Science and Engineering, Brac University Audio classification Siamese neural network speaker recognition Oneshot learning Triplet loss Multimedia systems Neural networks (Computer science) This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2021. Cataloged from PDF version of thesis. Includes bibliographical references (pages 43-49). Human knowledge can quickly learn any unfamiliar concepts based on what they have previously learned. Keeping this in mind, researchers tested training models with limited training data in machine learning classification functions.One-shot learning has proven to be effective in the researches of Computer Vision sector, as it works accurately with a single labeled training example and a small number of training sets. By using a single input example from each class, one-shot learning can work more efficiently and quickly. For training the architecture of neural networks to predict similarities between two inputs, one-shot learning employs the Siamese network as neural network architecture. This architecture has been successfully used for various audio-related problems, but its use of one-shot learning in speaker recognition has received less attention. The goal of this thesis is to apply the concept of one-shot learning to classify speakers by extracting specific features, where it uses triplet loss to train the model to learn through Siamese network and calculates the rate of similarity while testing via support set and a query set to recognize the speaker accurately and faster. The proposed system is trained on the LibriSpeech dataset, which contains different audio recordings of speakers. The final one-shot is performed on few previously unseen classes, utilizing only a single sample of each type while making the classification by extracting features from training data and calculating the similarity ratio to recognize the speaker through the proposed model trained by the Siamese network. As we tested for several classes, the accuracy varied: for two classes, we got 100%, for three classes 95%, for four classes 84%, and for five classes 74%, which is significantly better than the other algorithms we tested for our solution. The results suggest that Siamese networks are a viable solution to the challenging one-shot audio classification issue. Shahriar Rumi Dipto Priata Nowshin Intesur Ahmed Deboraj Chowdhury Galib Abdun Noor B. Computer Science 2022-01-12T06:16:19Z 2022-01-12T06:16:19Z 2021 2021-09 Thesis ID 20141036 ID 20141035 ID 18101685 ID 18101242 ID 20141037 http://hdl.handle.net/10361/15871 en Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 49 pages application/pdf Brac University
institution	Brac University
collection	Institutional Repository
language	English
topic	Audio classification Siamese neural network speaker recognition Oneshot learning Triplet loss Multimedia systems Neural networks (Computer science)
spellingShingle	Audio classification Siamese neural network speaker recognition Oneshot learning Triplet loss Multimedia systems Neural networks (Computer science) Dipto, Shahriar Rumi Nowshin, Priata Ahmed, Intesur Chowdhury, Deboraj Noor, Galib Abdun One voice is all you need: a one-shot approach to recognize you
description	This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2021.
author2	Chakrabarty, Amitabha
author_facet	Chakrabarty, Amitabha Dipto, Shahriar Rumi Nowshin, Priata Ahmed, Intesur Chowdhury, Deboraj Noor, Galib Abdun
format	Thesis
author	Dipto, Shahriar Rumi Nowshin, Priata Ahmed, Intesur Chowdhury, Deboraj Noor, Galib Abdun
author_sort	Dipto, Shahriar Rumi
title	One voice is all you need: a one-shot approach to recognize you
title_short	One voice is all you need: a one-shot approach to recognize you
title_full	One voice is all you need: a one-shot approach to recognize you
title_fullStr	One voice is all you need: a one-shot approach to recognize you
title_full_unstemmed	One voice is all you need: a one-shot approach to recognize you
title_sort	one voice is all you need: a one-shot approach to recognize you
publisher	Brac University
publishDate	2022
url	http://hdl.handle.net/10361/15871
work_keys_str_mv	AT diptoshahriarrumi onevoiceisallyouneedaoneshotapproachtorecognizeyou AT nowshinpriata onevoiceisallyouneedaoneshotapproachtorecognizeyou AT ahmedintesur onevoiceisallyouneedaoneshotapproachtorecognizeyou AT chowdhurydeboraj onevoiceisallyouneedaoneshotapproachtorecognizeyou AT noorgalibabdun onevoiceisallyouneedaoneshotapproachtorecognizeyou
_version_	1814306995088392192

One voice is all you need: a one-shot approach to recognize you

Míreanna comhchosúla