Analyzing optimization landscape of recent policy optimization methods in deep RL
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022.
Hlavní autoři: | , , |
---|---|
Další autoři: | |
Médium: | Diplomová práce |
Jazyk: | English |
Vydáno: |
Brac University
2023
|
Témata: | |
On-line přístup: | http://hdl.handle.net/10361/18306 |
id |
10361-18306 |
---|---|
record_format |
dspace |
spelling |
10361-183062023-05-23T21:01:53Z Analyzing optimization landscape of recent policy optimization methods in deep RL Khan, Mahir Asaf Ashraf, Adib Amin, Tahmid Adib Rashid, Warida Islam, Riashat Department of Computer Science and Engineering, Brac University Optimization landscape Policy optimization Deep reinforcement learning Variance reduction Control variates Cognitive learning theory Machine learning This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022. Cataloged from PDF version of thesis. Includes bibliographical references (pages 42-43). In this work we will analyze control variates and baselines in policy optimization methods in deep reinforcement learning (RL). Recently there has been a lot of progress in policy gradient methods in deep RL, where baselines are typically used for variance reduction. However, there has been recent progress on the mirage of state and state-action dependent baselines in policy gradients. To this end, it is not clear how control variates play a role in the optimization landscape of policy gradients. This work will dive into understanding the landscape issues of policy optimization, to see whether control variates are only for variance reduction or whether they play a role in smoothing out the optimization landscape. Our work will further investigate the issues of different optimizers used in deep RL experiments, and ablation studies of the interplay of control variates and optimizers in policy gradients from an optimization perspective. Mahir Asaf Khan Adib Ashraf Tahmid Adib Amin B. Computer Science 2023-05-23T04:43:23Z 2023-05-23T04:43:23Z 2022 2022-05 Thesis ID 22141075 ID 20241063 ID 22141076 http://hdl.handle.net/10361/18306 en Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 43 pages application/pdf Brac University |
institution |
Brac University |
collection |
Institutional Repository |
language |
English |
topic |
Optimization landscape Policy optimization Deep reinforcement learning Variance reduction Control variates Cognitive learning theory Machine learning |
spellingShingle |
Optimization landscape Policy optimization Deep reinforcement learning Variance reduction Control variates Cognitive learning theory Machine learning Khan, Mahir Asaf Ashraf, Adib Amin, Tahmid Adib Analyzing optimization landscape of recent policy optimization methods in deep RL |
description |
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022. |
author2 |
Rashid, Warida |
author_facet |
Rashid, Warida Khan, Mahir Asaf Ashraf, Adib Amin, Tahmid Adib |
format |
Thesis |
author |
Khan, Mahir Asaf Ashraf, Adib Amin, Tahmid Adib |
author_sort |
Khan, Mahir Asaf |
title |
Analyzing optimization landscape of recent policy optimization methods in deep RL |
title_short |
Analyzing optimization landscape of recent policy optimization methods in deep RL |
title_full |
Analyzing optimization landscape of recent policy optimization methods in deep RL |
title_fullStr |
Analyzing optimization landscape of recent policy optimization methods in deep RL |
title_full_unstemmed |
Analyzing optimization landscape of recent policy optimization methods in deep RL |
title_sort |
analyzing optimization landscape of recent policy optimization methods in deep rl |
publisher |
Brac University |
publishDate |
2023 |
url |
http://hdl.handle.net/10361/18306 |
work_keys_str_mv |
AT khanmahirasaf analyzingoptimizationlandscapeofrecentpolicyoptimizationmethodsindeeprl AT ashrafadib analyzingoptimizationlandscapeofrecentpolicyoptimizationmethodsindeeprl AT amintahmidadib analyzingoptimizationlandscapeofrecentpolicyoptimizationmethodsindeeprl |
_version_ |
1814309715951222784 |