Document template identification and data extraction using machine learning and deep learning approach

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024.

Bibliografske podrobnosti
Main Authors: Roy, Kaushik, Islam, Md Fuad, Rimon, Md Minhazul Islam, Mobarak, Tasnim, Priota, Mysha Samiha
Drugi avtorji: Rhaman, Md. Khalilur
Format: Thesis
Jezik:English
Izdano: Brac University 2024
Teme:
Online dostop:http://hdl.handle.net/10361/22915
id 10361-22915
record_format dspace
spelling 10361-229152024-05-26T21:05:54Z Document template identification and data extraction using machine learning and deep learning approach Roy, Kaushik Islam, Md Fuad Rimon, Md Minhazul Islam Mobarak, Tasnim Priota, Mysha Samiha Rhaman, Md. Khalilur Department of Computer Science and Engineering, Brac University CNN YOLOv8 Deep learning model SGD classifier SVM Machine learning KNN Optical data processing Data structures (Computer science) Neural networks (Computer science) Deep learning (Machine learning) Cognitive learning theory (Deep learning) This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024. Cataloged from PDF version of thesis. Includes bibliographical references (pages 42-43). As the world keeps progressing and we continue on our path to a technologically advanced tomorrow, the demand for quick data processing and organization is becoming more and more necessary. People now have access to technology more than ever before. Nowadays, technology allows for the processing and storing of nearly every kind of data. However, procedures requiring paper are still in place and the time-consuming process of moving these data from paper to computers is laborious which reduces work efficiency. Our goal is to make this tedious and time-consuming process fast and efficient, by directly converting the information of the manually checked scripts into digital data. Our research strategy involved gathering information from Brac University examination scripts, digitizing the verified scripts’ data, and then uploading it to a spreadsheet file. The goal of the process is to make Brac University’s grade-processing system quicker, more effective, and less tiresome for the teachers. Three machine learning models and three deep learning models as well as one transfer learning model were utilized for this study. Three common measures were used to evaluate the results which are precision, recall and F1-score. The KNN model showed up to 85% accuracy, whilst SVM showed 87% and SGDClassifier showed 81% accuracy. Meanwhile CNN and YOLOv8 showed 98.6% and 98.8% accuracy respectively. Since YOLOv8 is providing the best accuracy, we will be using this to create an interface that will carry out the complete data transformation process from beginning to end. Starting with capturing the image, processing it to identify the areas from which the data will be collected, and finally extracting the data, in the entire process YOLOv8 is going to be used. In the end, we will obtain precisely extracted data from handwritten exam scripts, which will be arranged in a spreadsheet, digitizing the laborious task of manually inputting each and every grade in a spreadsheet. Kaushik Roy Md Fuad Islam Md Minhazul Islam Rimon Tasnim Mobarak Mysha Samiha Priota B.Sc in Computer Science 2024-05-26T03:42:05Z 2024-05-26T03:42:05Z ©2024 2024-01 Thesis ID: 20101185 ID: 20101060 ID: 20101078 ID: 20101296 ID: 20301205 http://hdl.handle.net/10361/22915 en Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 55 pages application/pdf Brac University
institution Brac University
collection Institutional Repository
language English
topic CNN
YOLOv8
Deep learning model
SGD classifier
SVM
Machine learning
KNN
Optical data processing
Data structures (Computer science)
Neural networks (Computer science)
Deep learning (Machine learning)
Cognitive learning theory (Deep learning)
spellingShingle CNN
YOLOv8
Deep learning model
SGD classifier
SVM
Machine learning
KNN
Optical data processing
Data structures (Computer science)
Neural networks (Computer science)
Deep learning (Machine learning)
Cognitive learning theory (Deep learning)
Roy, Kaushik
Islam, Md Fuad
Rimon, Md Minhazul Islam
Mobarak, Tasnim
Priota, Mysha Samiha
Document template identification and data extraction using machine learning and deep learning approach
description This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024.
author2 Rhaman, Md. Khalilur
author_facet Rhaman, Md. Khalilur
Roy, Kaushik
Islam, Md Fuad
Rimon, Md Minhazul Islam
Mobarak, Tasnim
Priota, Mysha Samiha
format Thesis
author Roy, Kaushik
Islam, Md Fuad
Rimon, Md Minhazul Islam
Mobarak, Tasnim
Priota, Mysha Samiha
author_sort Roy, Kaushik
title Document template identification and data extraction using machine learning and deep learning approach
title_short Document template identification and data extraction using machine learning and deep learning approach
title_full Document template identification and data extraction using machine learning and deep learning approach
title_fullStr Document template identification and data extraction using machine learning and deep learning approach
title_full_unstemmed Document template identification and data extraction using machine learning and deep learning approach
title_sort document template identification and data extraction using machine learning and deep learning approach
publisher Brac University
publishDate 2024
url http://hdl.handle.net/10361/22915
work_keys_str_mv AT roykaushik documenttemplateidentificationanddataextractionusingmachinelearninganddeeplearningapproach
AT islammdfuad documenttemplateidentificationanddataextractionusingmachinelearninganddeeplearningapproach
AT rimonmdminhazulislam documenttemplateidentificationanddataextractionusingmachinelearninganddeeplearningapproach
AT mobaraktasnim documenttemplateidentificationanddataextractionusingmachinelearninganddeeplearningapproach
AT priotamyshasamiha documenttemplateidentificationanddataextractionusingmachinelearninganddeeplearningapproach
_version_ 1814309815927701504