BnClinical-Sum: benchmarking datasets for Bangla long & short clinical dialogue summarization

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024.

Bibliografske podrobnosti
Main Authors:	Adib, Quazi Adibur Rahman, Alam, Sanjana Binte
Drugi avtorji:	Sadeque, Farig Yousuf
Format:	Thesis
Jezik:	English
Izdano:	Brac University 2024
Teme:	ClinicalNLP mBART Dialouge2Note Bangla language mLongT5 Natural language processing (Computer science) Data mining
Online dostop:	http://hdl.handle.net/10361/22910

id	10361-22910
record_format	dspace
spelling	10361-229102024-05-26T21:03:54Z BnClinical-Sum: benchmarking datasets for Bangla long & short clinical dialogue summarization Adib, Quazi Adibur Rahman Alam, Sanjana Binte Sadeque, Farig Yousuf Department of Computer Science and Engineering, Brac University ClinicalNLP mBART Dialouge2Note Bangla language mLongT5 Natural language processing (Computer science) Data mining This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024. Cataloged from PDF version of thesis. Includes bibliographical references (pages 44-47). Despite significant improvements in the general-purpose text summarization task in the past decade, clinical conversion summarization is going through a tough time due to a lack of initiative to provide open-source datasets to the NLP community. In this work, we are presenting the first long and short Bangla Clinical Dialogue to Note Summarization datasets: BnClinical-Sum. Long conversations are detailed conversations with additional medical history. For the long dialogue dataset, we have accumulated around 207 pairs of full conversations and notes. Each note consists of in-depth discussions on previous medical histories, family medical records, and a wide variety of other topics. For the short dialogue version, our dataset consists of 1701 real-life short manually translated clinical conversations and their corresponding notes. The short dialogue dataset consists of subsets of long dialogue where each dialogue snippet addresses one sub-topic like previous medical histories, family medical records, etc. Those conversations are from 20 different categories like labs, assessments, plans, etc. Owing to demonstrating the efficacy of both datasets, we have trained our datasets on current state-of-the-art text summarization and text-to-text generative models to provide a solid benchmark for clinical conversion summarization tasks. Quazi Adibur Rahman Adib Sanjana Binte Alam B.Sc in Computer Science 2024-05-26T02:57:33Z 2024-05-26T02:57:33Z ©2024 2024-01 Thesis ID: 21241056 ID: 20301455 http://hdl.handle.net/10361/22910 en Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 60 pages application/pdf Brac University
institution	Brac University
collection	Institutional Repository
language	English
topic	ClinicalNLP mBART Dialouge2Note Bangla language mLongT5 Natural language processing (Computer science) Data mining
spellingShingle	ClinicalNLP mBART Dialouge2Note Bangla language mLongT5 Natural language processing (Computer science) Data mining Adib, Quazi Adibur Rahman Alam, Sanjana Binte BnClinical-Sum: benchmarking datasets for Bangla long & short clinical dialogue summarization
description	This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024.
author2	Sadeque, Farig Yousuf
author_facet	Sadeque, Farig Yousuf Adib, Quazi Adibur Rahman Alam, Sanjana Binte
format	Thesis
author	Adib, Quazi Adibur Rahman Alam, Sanjana Binte
author_sort	Adib, Quazi Adibur Rahman
title	BnClinical-Sum: benchmarking datasets for Bangla long & short clinical dialogue summarization
title_short	BnClinical-Sum: benchmarking datasets for Bangla long & short clinical dialogue summarization
title_full	BnClinical-Sum: benchmarking datasets for Bangla long & short clinical dialogue summarization
title_fullStr	BnClinical-Sum: benchmarking datasets for Bangla long & short clinical dialogue summarization
title_full_unstemmed	BnClinical-Sum: benchmarking datasets for Bangla long & short clinical dialogue summarization
title_sort	bnclinical-sum: benchmarking datasets for bangla long & short clinical dialogue summarization
publisher	Brac University
publishDate	2024
url	http://hdl.handle.net/10361/22910
work_keys_str_mv	AT adibquaziadiburrahman bnclinicalsumbenchmarkingdatasetsforbanglalongshortclinicaldialoguesummarization AT alamsanjanabinte bnclinicalsumbenchmarkingdatasetsforbanglalongshortclinicaldialoguesummarization
_version_	1814308844602392576

BnClinical-Sum: benchmarking datasets for Bangla long & short clinical dialogue summarization

Podobne knjige/članki