An autoencoder-based decentralized clustering leveraging model aggregation fusion strategy

This thesis is submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science, 2024.

Detalhes bibliográficos
Autor principal: Hasan, Nahid
Outros Autores: Alam, Md. Golam Rabiul
Formato: Tese
Idioma:English
Publicado em: Brac University 2024
Assuntos:
Acesso em linha:http://hdl.handle.net/10361/24039
id 10361-24039
record_format dspace
institution Brac University
collection Institutional Repository
language English
topic Federated learning
Unsupervised learning
Auto-encoder
Fusion strategy
Multisensor data fusion.
Electronic data processing--Distributed processing.
Neural networks (Computer science).
Data mining
spellingShingle Federated learning
Unsupervised learning
Auto-encoder
Fusion strategy
Multisensor data fusion.
Electronic data processing--Distributed processing.
Neural networks (Computer science).
Data mining
Hasan, Nahid
An autoencoder-based decentralized clustering leveraging model aggregation fusion strategy
description This thesis is submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science, 2024.
author2 Alam, Md. Golam Rabiul
author_facet Alam, Md. Golam Rabiul
Hasan, Nahid
format Thesis
author Hasan, Nahid
author_sort Hasan, Nahid
title An autoencoder-based decentralized clustering leveraging model aggregation fusion strategy
title_short An autoencoder-based decentralized clustering leveraging model aggregation fusion strategy
title_full An autoencoder-based decentralized clustering leveraging model aggregation fusion strategy
title_fullStr An autoencoder-based decentralized clustering leveraging model aggregation fusion strategy
title_full_unstemmed An autoencoder-based decentralized clustering leveraging model aggregation fusion strategy
title_sort autoencoder-based decentralized clustering leveraging model aggregation fusion strategy
publisher Brac University
publishDate 2024
url http://hdl.handle.net/10361/24039
work_keys_str_mv AT hasannahid anautoencoderbaseddecentralizedclusteringleveragingmodelaggregationfusionstrategy
AT hasannahid autoencoderbaseddecentralizedclusteringleveragingmodelaggregationfusionstrategy
_version_ 1814308275829604352
spelling 10361-240392024-09-09T21:02:32Z An autoencoder-based decentralized clustering leveraging model aggregation fusion strategy Hasan, Nahid Alam, Md. Golam Rabiul Department of Computer Science and Engineering, Brac University Federated learning Unsupervised learning Auto-encoder Fusion strategy Multisensor data fusion. Electronic data processing--Distributed processing. Neural networks (Computer science). Data mining This thesis is submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science, 2024. Cataloged from the PDF version of the thesis. Includes bibliographical references (pages 53-55). Unsupervised clustering plays a crucial role in various real-life applications. It works by grouping similar data points together based on certain features or characteristics, without the use of predefined labels. The process generally starts with gathering data in a centralized system that are to be clustered. This data could be in the form of numerical features, text, images, or any other type of information. The exponential expansion of digital transformation, the Internet of Things (IoT), social media, and online platforms has precipitated an unprecedented surge in data generation. This proliferation is characterized by an incessant stream of information flowing from various sources, encompassing user interactions, sensor readings, online transactions, and more. This deluge of data poses both challenges and opportunities for businesses, governments, and individuals alike. The ever-increasing amount of data poses both opportunities and challenges. So, gathering, managing, processing this amount of data in a centralized system requires time and is a very tough process. Additionally, concerns related to data privacy, security, and ethical considerations become more prominent as data volumes continue to grow. Moreover, it’s important to respect individuals’ privacy rights and adhere to relevant data protection laws and regulations. Federated learning addresses concerns about data volume and privacy by leaving user data on devices. Federated unsupervised representation learning is an architecture that pre-traines deep neural networks utilizing unlabeled input in a federated fashion via unsupervised representation learning. In centralized settings, model-based clustering approaches demonstrate significant effectiveness. These methods rely on statistical models to identify underlying patterns and group data points accordingly. By leveraging sophisticated algorithms, modelbased clustering can efficiently handle complex data structures and accurately partition datasets into meaningful clusters. This approach enables centralized systems to efficiently organize and analyze large volumes of data, facilitating insights and decision-making processes across various domains. Moreover, model-based clustering offers flexibility in accommodating different data distributions and can adapt to diverse clustering requirements, making it a versatile tool for centralized data analysis tasks. In contrast to the centralized setup, this way of clustering in federated settings is still relatively unexplored, maybe because training models in a highly diversified context using the FedAvg method is more difficult. The normalizing flow model is used by the recently announced Unsupervised Iterative Federated Clustering (UIFCA) Algorithm to perform clustering on unlabeled datasets in federated environments. The IFCA framework, which tackles the problem of very varied settings, is the foundation of UIFCA. A novel approach for decentralized clustering utilizing proposed model parameter aggregation strategy FednadamN in conjunction with the deep generative model autoencoder is introduced. FednadamN combines the benefits of two cutting-edge optimization methods for federated learning: Adam and Nadam. Adam optimization offers quick convergence and resilience to noisy data by using adaptive learning rates based on the first and second moments of gradients. Adam is expanded by Nadam with the use of Nesterov accelerated gradients, hence increasing the stability and speed of convergence. The method addresses the challenge of clustering in decentralized settings by leveraging the collective intelligence of distributed nodes while preserving data privacy and minimizing communication overhead. By aggregating model parameters across decentralized nodes and employing Autoencoder-based representations, efficient clustering is enabled efficient clustering without the need for central data storage or coordination. This approach promises to enhance scalability, privacy, and performance in decentralized clustering tasks across various domains. Additionally, a comparison between the tailored approach and the current technique using benchmark datasets is offered. The following four benchmark datasets were used: image segmentation, protein localization, letter image recognition, and vowel deterrence. The suggested technique for clustering letter image recognition data has produced the greatest mutual information score of 1.192 and highest v measure score of 0.373 using the kmeans algorithm. However, FedAvg’s fuzzy k means algorithm yields the highest rand index score of 0.925. The proposed approach for clustering Deterding Vowel Recognition Data has the highest v measure score of 0.264 and the highest rand index score of 0.850 when using the kmeans algorithm; however, it performs less well than FedAdam, which uses the minibatch kmeans algorithm to show a v measure score of 0.258. The proposed approach for clustering Protein Localization Data yields the greatest rand score 0.774 , highest mutual info score 0.908 , and highest v measure score 0.527 while utilizing the minibatch kmeans algorithm. The proposed method for clustering Image Segmentation data yields the greatest mutual information score of 1.084, the highest rand score of 0.849, and the highest v measure score of 0.565 when utilizing the minibatch kmeans algorithm. This result demonstrates the suggested approach’s improved performance and its potential applicability for various clustering goals. The enhanced efficiency of this method makes it a valuable tool for diverse clustering tasks. Its robustness and adaptability underscore its utility in different contexts. Moreover, the approach’s superior outcomes suggest broader relevance across multiple domains. Nahid Hasan M.Sc. in Computer Science 2024-09-09T09:28:04Z 2024-09-09T09:28:04Z ©2024 2024-05 Thesis ID 22366036 http://hdl.handle.net/10361/24039 en Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 67 pages application/pdf Brac University