Analyzing optimization landscape of recent policy optimization methods in deep RL

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022.

Podrobná bibliografie
Hlavní autoři:	Khan, Mahir Asaf, Ashraf, Adib, Amin, Tahmid Adib
Další autoři:	Rashid, Warida
Médium:	Diplomová práce
Jazyk:	English
Vydáno:	Brac University 2023
Témata:	Optimization landscape Policy optimization Deep reinforcement learning Variance reduction Control variates Cognitive learning theory Machine learning
On-line přístup:	http://hdl.handle.net/10361/18306

id	10361-18306
record_format	dspace
spelling	10361-183062023-05-23T21:01:53Z Analyzing optimization landscape of recent policy optimization methods in deep RL Khan, Mahir Asaf Ashraf, Adib Amin, Tahmid Adib Rashid, Warida Islam, Riashat Department of Computer Science and Engineering, Brac University Optimization landscape Policy optimization Deep reinforcement learning Variance reduction Control variates Cognitive learning theory Machine learning This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022. Cataloged from PDF version of thesis. Includes bibliographical references (pages 42-43). In this work we will analyze control variates and baselines in policy optimization methods in deep reinforcement learning (RL). Recently there has been a lot of progress in policy gradient methods in deep RL, where baselines are typically used for variance reduction. However, there has been recent progress on the mirage of state and state-action dependent baselines in policy gradients. To this end, it is not clear how control variates play a role in the optimization landscape of policy gradients. This work will dive into understanding the landscape issues of policy optimization, to see whether control variates are only for variance reduction or whether they play a role in smoothing out the optimization landscape. Our work will further investigate the issues of different optimizers used in deep RL experiments, and ablation studies of the interplay of control variates and optimizers in policy gradients from an optimization perspective. Mahir Asaf Khan Adib Ashraf Tahmid Adib Amin B. Computer Science 2023-05-23T04:43:23Z 2023-05-23T04:43:23Z 2022 2022-05 Thesis ID 22141075 ID 20241063 ID 22141076 http://hdl.handle.net/10361/18306 en Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 43 pages application/pdf Brac University
institution	Brac University
collection	Institutional Repository
language	English
topic	Optimization landscape Policy optimization Deep reinforcement learning Variance reduction Control variates Cognitive learning theory Machine learning
spellingShingle	Optimization landscape Policy optimization Deep reinforcement learning Variance reduction Control variates Cognitive learning theory Machine learning Khan, Mahir Asaf Ashraf, Adib Amin, Tahmid Adib Analyzing optimization landscape of recent policy optimization methods in deep RL
description	This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022.
author2	Rashid, Warida
author_facet	Rashid, Warida Khan, Mahir Asaf Ashraf, Adib Amin, Tahmid Adib
format	Thesis
author	Khan, Mahir Asaf Ashraf, Adib Amin, Tahmid Adib
author_sort	Khan, Mahir Asaf
title	Analyzing optimization landscape of recent policy optimization methods in deep RL
title_short	Analyzing optimization landscape of recent policy optimization methods in deep RL
title_full	Analyzing optimization landscape of recent policy optimization methods in deep RL
title_fullStr	Analyzing optimization landscape of recent policy optimization methods in deep RL
title_full_unstemmed	Analyzing optimization landscape of recent policy optimization methods in deep RL
title_sort	analyzing optimization landscape of recent policy optimization methods in deep rl
publisher	Brac University
publishDate	2023
url	http://hdl.handle.net/10361/18306
work_keys_str_mv	AT khanmahirasaf analyzingoptimizationlandscapeofrecentpolicyoptimizationmethodsindeeprl AT ashrafadib analyzingoptimizationlandscapeofrecentpolicyoptimizationmethodsindeeprl AT amintahmidadib analyzingoptimizationlandscapeofrecentpolicyoptimizationmethodsindeeprl
_version_	1814309715951222784

Analyzing optimization landscape of recent policy optimization methods in deep RL

Podobné jednotky