ProteoKnight: phage virion protein classification with CNN and uncertainty quantification

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024.

Opis bibliograficzny
Główni autorzy: Bhuiyan, Abir Ahammed, Neha, Samiha Afaf, Khan, Md. Ishrak
Kolejni autorzy: Rhaman, Md. Khalilur
Format: Praca dyplomowa
Język:English
Wydane: Brac University 2024
Hasła przedmiotowe:
Dostęp online:http://hdl.handle.net/10361/22830
id 10361-22830
record_format dspace
spelling 10361-228302024-05-15T21:01:15Z ProteoKnight: phage virion protein classification with CNN and uncertainty quantification Bhuiyan, Abir Ahammed Neha, Samiha Afaf Khan, Md. Ishrak Rhaman, Md. Khalilur Mukta, Jannatun Noor Department of Computer Science and Engineering, Brac University Phage virion Deep learning DNA-walk Monte Carlo dropout Convolutional neural network (CNN) Neural networks (Computer science) Deep learning (Machine learning) This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024. Cataloged from PDF version of thesis. Includes bibliographical references (pages 51-54). microbial ecosystems. This has led to their increased utilization in several research areas, such as bacterial genome engineering, phage therapy, disease diagnostics, and viral host identification. The structure of phages is made up of proteins called phage virion proteins (PVP). Classifying these proteins is important for genomic research, which in turn helps us understand the complex interactions between phages and their hosts in the context of making antibacterial drugs. Replacing the tedious traditional procedures, a growing number of computational strategies are being employed to annotate phage protein sequences acquired using high-throughput sequencing. Among these techniques, deep learning approaches demonstrate improved performance in classification outcomes. Such procedures require special sequence encodings for the model to perceive the protein sequences with their distinctive features. Numerous ways have been examined and assessed, while novel methods continue to emerge in order to optimize the task in terms of resource utilization and prediction accuracy. The objective of our work, ProteoKnight, is to explore and develop a unique encoding technique for phage proteins and demonstrate its effectiveness via classification. In our work, we make use of the time-separated PVP dataset that [47] introduced. Furthermore, this study aims to address the lack of research conducted on uncertainty analysis by exploring the domain of uncertainty in binary PVP classification using Monte Carlo Dropout (MCD) method. The experimental findings demonstrate the effectiveness of our strategy for binary classification, achieving a prediction accuracy of 90.2%. However, the accuracy for multi-class classification remains suboptimal. Furthermore, our uncertainty analysis reveals that the class and sequence length show variability in prediction confidence for our suggested classification approach. Abir Ahammed Bhuiyan Samiha Afaf Neha Md. Ishrak Khan B.Sc. in Computer Science 2024-05-15T04:34:03Z 2024-05-15T04:34:03Z ©2024 2024-01 Thesis ID: 20101197 ID: 20101266 ID: 20101051 http://hdl.handle.net/10361/22830 en Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 68 pages application/pdf Brac University
institution Brac University
collection Institutional Repository
language English
topic Phage virion
Deep learning
DNA-walk
Monte Carlo dropout
Convolutional neural network (CNN)
Neural networks (Computer science)
Deep learning (Machine learning)
spellingShingle Phage virion
Deep learning
DNA-walk
Monte Carlo dropout
Convolutional neural network (CNN)
Neural networks (Computer science)
Deep learning (Machine learning)
Bhuiyan, Abir Ahammed
Neha, Samiha Afaf
Khan, Md. Ishrak
ProteoKnight: phage virion protein classification with CNN and uncertainty quantification
description This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024.
author2 Rhaman, Md. Khalilur
author_facet Rhaman, Md. Khalilur
Bhuiyan, Abir Ahammed
Neha, Samiha Afaf
Khan, Md. Ishrak
format Thesis
author Bhuiyan, Abir Ahammed
Neha, Samiha Afaf
Khan, Md. Ishrak
author_sort Bhuiyan, Abir Ahammed
title ProteoKnight: phage virion protein classification with CNN and uncertainty quantification
title_short ProteoKnight: phage virion protein classification with CNN and uncertainty quantification
title_full ProteoKnight: phage virion protein classification with CNN and uncertainty quantification
title_fullStr ProteoKnight: phage virion protein classification with CNN and uncertainty quantification
title_full_unstemmed ProteoKnight: phage virion protein classification with CNN and uncertainty quantification
title_sort proteoknight: phage virion protein classification with cnn and uncertainty quantification
publisher Brac University
publishDate 2024
url http://hdl.handle.net/10361/22830
work_keys_str_mv AT bhuiyanabirahammed proteoknightphagevirionproteinclassificationwithcnnanduncertaintyquantification
AT nehasamihaafaf proteoknightphagevirionproteinclassificationwithcnnanduncertaintyquantification
AT khanmdishrak proteoknightphagevirionproteinclassificationwithcnnanduncertaintyquantification
_version_ 1814307588229038080