Document template identification and data extraction using machine learning and deep learning approach
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024.
Main Authors: | , , , , |
---|---|
Drugi avtorji: | |
Format: | Thesis |
Jezik: | English |
Izdano: |
Brac University
2024
|
Teme: | |
Online dostop: | http://hdl.handle.net/10361/22915 |
id |
10361-22915 |
---|---|
record_format |
dspace |
spelling |
10361-229152024-05-26T21:05:54Z Document template identification and data extraction using machine learning and deep learning approach Roy, Kaushik Islam, Md Fuad Rimon, Md Minhazul Islam Mobarak, Tasnim Priota, Mysha Samiha Rhaman, Md. Khalilur Department of Computer Science and Engineering, Brac University CNN YOLOv8 Deep learning model SGD classifier SVM Machine learning KNN Optical data processing Data structures (Computer science) Neural networks (Computer science) Deep learning (Machine learning) Cognitive learning theory (Deep learning) This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024. Cataloged from PDF version of thesis. Includes bibliographical references (pages 42-43). As the world keeps progressing and we continue on our path to a technologically advanced tomorrow, the demand for quick data processing and organization is becoming more and more necessary. People now have access to technology more than ever before. Nowadays, technology allows for the processing and storing of nearly every kind of data. However, procedures requiring paper are still in place and the time-consuming process of moving these data from paper to computers is laborious which reduces work efficiency. Our goal is to make this tedious and time-consuming process fast and efficient, by directly converting the information of the manually checked scripts into digital data. Our research strategy involved gathering information from Brac University examination scripts, digitizing the verified scripts’ data, and then uploading it to a spreadsheet file. The goal of the process is to make Brac University’s grade-processing system quicker, more effective, and less tiresome for the teachers. Three machine learning models and three deep learning models as well as one transfer learning model were utilized for this study. Three common measures were used to evaluate the results which are precision, recall and F1-score. The KNN model showed up to 85% accuracy, whilst SVM showed 87% and SGDClassifier showed 81% accuracy. Meanwhile CNN and YOLOv8 showed 98.6% and 98.8% accuracy respectively. Since YOLOv8 is providing the best accuracy, we will be using this to create an interface that will carry out the complete data transformation process from beginning to end. Starting with capturing the image, processing it to identify the areas from which the data will be collected, and finally extracting the data, in the entire process YOLOv8 is going to be used. In the end, we will obtain precisely extracted data from handwritten exam scripts, which will be arranged in a spreadsheet, digitizing the laborious task of manually inputting each and every grade in a spreadsheet. Kaushik Roy Md Fuad Islam Md Minhazul Islam Rimon Tasnim Mobarak Mysha Samiha Priota B.Sc in Computer Science 2024-05-26T03:42:05Z 2024-05-26T03:42:05Z ©2024 2024-01 Thesis ID: 20101185 ID: 20101060 ID: 20101078 ID: 20101296 ID: 20301205 http://hdl.handle.net/10361/22915 en Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 55 pages application/pdf Brac University |
institution |
Brac University |
collection |
Institutional Repository |
language |
English |
topic |
CNN YOLOv8 Deep learning model SGD classifier SVM Machine learning KNN Optical data processing Data structures (Computer science) Neural networks (Computer science) Deep learning (Machine learning) Cognitive learning theory (Deep learning) |
spellingShingle |
CNN YOLOv8 Deep learning model SGD classifier SVM Machine learning KNN Optical data processing Data structures (Computer science) Neural networks (Computer science) Deep learning (Machine learning) Cognitive learning theory (Deep learning) Roy, Kaushik Islam, Md Fuad Rimon, Md Minhazul Islam Mobarak, Tasnim Priota, Mysha Samiha Document template identification and data extraction using machine learning and deep learning approach |
description |
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024. |
author2 |
Rhaman, Md. Khalilur |
author_facet |
Rhaman, Md. Khalilur Roy, Kaushik Islam, Md Fuad Rimon, Md Minhazul Islam Mobarak, Tasnim Priota, Mysha Samiha |
format |
Thesis |
author |
Roy, Kaushik Islam, Md Fuad Rimon, Md Minhazul Islam Mobarak, Tasnim Priota, Mysha Samiha |
author_sort |
Roy, Kaushik |
title |
Document template identification and data extraction using machine learning and deep learning approach |
title_short |
Document template identification and data extraction using machine learning and deep learning approach |
title_full |
Document template identification and data extraction using machine learning and deep learning approach |
title_fullStr |
Document template identification and data extraction using machine learning and deep learning approach |
title_full_unstemmed |
Document template identification and data extraction using machine learning and deep learning approach |
title_sort |
document template identification and data extraction using machine learning and deep learning approach |
publisher |
Brac University |
publishDate |
2024 |
url |
http://hdl.handle.net/10361/22915 |
work_keys_str_mv |
AT roykaushik documenttemplateidentificationanddataextractionusingmachinelearninganddeeplearningapproach AT islammdfuad documenttemplateidentificationanddataextractionusingmachinelearninganddeeplearningapproach AT rimonmdminhazulislam documenttemplateidentificationanddataextractionusingmachinelearninganddeeplearningapproach AT mobaraktasnim documenttemplateidentificationanddataextractionusingmachinelearninganddeeplearningapproach AT priotamyshasamiha documenttemplateidentificationanddataextractionusingmachinelearninganddeeplearningapproach |
_version_ |
1814309815927701504 |