Analyzing optimization landscape of recent policy optimization methods in deep RL

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022.

Podrobná bibliografie
Hlavní autoři: Khan, Mahir Asaf, Ashraf, Adib, Amin, Tahmid Adib
Další autoři: Rashid, Warida
Médium: Diplomová práce
Jazyk:English
Vydáno: Brac University 2023
Témata:
On-line přístup:http://hdl.handle.net/10361/18306
id 10361-18306
record_format dspace
spelling 10361-183062023-05-23T21:01:53Z Analyzing optimization landscape of recent policy optimization methods in deep RL Khan, Mahir Asaf Ashraf, Adib Amin, Tahmid Adib Rashid, Warida Islam, Riashat Department of Computer Science and Engineering, Brac University Optimization landscape Policy optimization Deep reinforcement learning Variance reduction Control variates Cognitive learning theory Machine learning This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022. Cataloged from PDF version of thesis. Includes bibliographical references (pages 42-43). In this work we will analyze control variates and baselines in policy optimization methods in deep reinforcement learning (RL). Recently there has been a lot of progress in policy gradient methods in deep RL, where baselines are typically used for variance reduction. However, there has been recent progress on the mirage of state and state-action dependent baselines in policy gradients. To this end, it is not clear how control variates play a role in the optimization landscape of policy gradients. This work will dive into understanding the landscape issues of policy optimization, to see whether control variates are only for variance reduction or whether they play a role in smoothing out the optimization landscape. Our work will further investigate the issues of different optimizers used in deep RL experiments, and ablation studies of the interplay of control variates and optimizers in policy gradients from an optimization perspective. Mahir Asaf Khan Adib Ashraf Tahmid Adib Amin B. Computer Science 2023-05-23T04:43:23Z 2023-05-23T04:43:23Z 2022 2022-05 Thesis ID 22141075 ID 20241063 ID 22141076 http://hdl.handle.net/10361/18306 en Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 43 pages application/pdf Brac University
institution Brac University
collection Institutional Repository
language English
topic Optimization landscape
Policy optimization
Deep reinforcement learning
Variance reduction
Control variates
Cognitive learning theory
Machine learning
spellingShingle Optimization landscape
Policy optimization
Deep reinforcement learning
Variance reduction
Control variates
Cognitive learning theory
Machine learning
Khan, Mahir Asaf
Ashraf, Adib
Amin, Tahmid Adib
Analyzing optimization landscape of recent policy optimization methods in deep RL
description This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022.
author2 Rashid, Warida
author_facet Rashid, Warida
Khan, Mahir Asaf
Ashraf, Adib
Amin, Tahmid Adib
format Thesis
author Khan, Mahir Asaf
Ashraf, Adib
Amin, Tahmid Adib
author_sort Khan, Mahir Asaf
title Analyzing optimization landscape of recent policy optimization methods in deep RL
title_short Analyzing optimization landscape of recent policy optimization methods in deep RL
title_full Analyzing optimization landscape of recent policy optimization methods in deep RL
title_fullStr Analyzing optimization landscape of recent policy optimization methods in deep RL
title_full_unstemmed Analyzing optimization landscape of recent policy optimization methods in deep RL
title_sort analyzing optimization landscape of recent policy optimization methods in deep rl
publisher Brac University
publishDate 2023
url http://hdl.handle.net/10361/18306
work_keys_str_mv AT khanmahirasaf analyzingoptimizationlandscapeofrecentpolicyoptimizationmethodsindeeprl
AT ashrafadib analyzingoptimizationlandscapeofrecentpolicyoptimizationmethodsindeeprl
AT amintahmidadib analyzingoptimizationlandscapeofrecentpolicyoptimizationmethodsindeeprl
_version_ 1814309715951222784