ProteoKnight: phage virion protein classification with CNN and uncertainty quantification
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024.
Główni autorzy: | , , |
---|---|
Kolejni autorzy: | |
Format: | Praca dyplomowa |
Język: | English |
Wydane: |
Brac University
2024
|
Hasła przedmiotowe: | |
Dostęp online: | http://hdl.handle.net/10361/22830 |
id |
10361-22830 |
---|---|
record_format |
dspace |
spelling |
10361-228302024-05-15T21:01:15Z ProteoKnight: phage virion protein classification with CNN and uncertainty quantification Bhuiyan, Abir Ahammed Neha, Samiha Afaf Khan, Md. Ishrak Rhaman, Md. Khalilur Mukta, Jannatun Noor Department of Computer Science and Engineering, Brac University Phage virion Deep learning DNA-walk Monte Carlo dropout Convolutional neural network (CNN) Neural networks (Computer science) Deep learning (Machine learning) This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024. Cataloged from PDF version of thesis. Includes bibliographical references (pages 51-54). microbial ecosystems. This has led to their increased utilization in several research areas, such as bacterial genome engineering, phage therapy, disease diagnostics, and viral host identification. The structure of phages is made up of proteins called phage virion proteins (PVP). Classifying these proteins is important for genomic research, which in turn helps us understand the complex interactions between phages and their hosts in the context of making antibacterial drugs. Replacing the tedious traditional procedures, a growing number of computational strategies are being employed to annotate phage protein sequences acquired using high-throughput sequencing. Among these techniques, deep learning approaches demonstrate improved performance in classification outcomes. Such procedures require special sequence encodings for the model to perceive the protein sequences with their distinctive features. Numerous ways have been examined and assessed, while novel methods continue to emerge in order to optimize the task in terms of resource utilization and prediction accuracy. The objective of our work, ProteoKnight, is to explore and develop a unique encoding technique for phage proteins and demonstrate its effectiveness via classification. In our work, we make use of the time-separated PVP dataset that [47] introduced. Furthermore, this study aims to address the lack of research conducted on uncertainty analysis by exploring the domain of uncertainty in binary PVP classification using Monte Carlo Dropout (MCD) method. The experimental findings demonstrate the effectiveness of our strategy for binary classification, achieving a prediction accuracy of 90.2%. However, the accuracy for multi-class classification remains suboptimal. Furthermore, our uncertainty analysis reveals that the class and sequence length show variability in prediction confidence for our suggested classification approach. Abir Ahammed Bhuiyan Samiha Afaf Neha Md. Ishrak Khan B.Sc. in Computer Science 2024-05-15T04:34:03Z 2024-05-15T04:34:03Z ©2024 2024-01 Thesis ID: 20101197 ID: 20101266 ID: 20101051 http://hdl.handle.net/10361/22830 en Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 68 pages application/pdf Brac University |
institution |
Brac University |
collection |
Institutional Repository |
language |
English |
topic |
Phage virion Deep learning DNA-walk Monte Carlo dropout Convolutional neural network (CNN) Neural networks (Computer science) Deep learning (Machine learning) |
spellingShingle |
Phage virion Deep learning DNA-walk Monte Carlo dropout Convolutional neural network (CNN) Neural networks (Computer science) Deep learning (Machine learning) Bhuiyan, Abir Ahammed Neha, Samiha Afaf Khan, Md. Ishrak ProteoKnight: phage virion protein classification with CNN and uncertainty quantification |
description |
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024. |
author2 |
Rhaman, Md. Khalilur |
author_facet |
Rhaman, Md. Khalilur Bhuiyan, Abir Ahammed Neha, Samiha Afaf Khan, Md. Ishrak |
format |
Thesis |
author |
Bhuiyan, Abir Ahammed Neha, Samiha Afaf Khan, Md. Ishrak |
author_sort |
Bhuiyan, Abir Ahammed |
title |
ProteoKnight: phage virion protein classification with CNN and uncertainty quantification |
title_short |
ProteoKnight: phage virion protein classification with CNN and uncertainty quantification |
title_full |
ProteoKnight: phage virion protein classification with CNN and uncertainty quantification |
title_fullStr |
ProteoKnight: phage virion protein classification with CNN and uncertainty quantification |
title_full_unstemmed |
ProteoKnight: phage virion protein classification with CNN and uncertainty quantification |
title_sort |
proteoknight: phage virion protein classification with cnn and uncertainty quantification |
publisher |
Brac University |
publishDate |
2024 |
url |
http://hdl.handle.net/10361/22830 |
work_keys_str_mv |
AT bhuiyanabirahammed proteoknightphagevirionproteinclassificationwithcnnanduncertaintyquantification AT nehasamihaafaf proteoknightphagevirionproteinclassificationwithcnnanduncertaintyquantification AT khanmdishrak proteoknightphagevirionproteinclassificationwithcnnanduncertaintyquantification |
_version_ |
1814307588229038080 |