To Örebro University

oru.seÖrebro University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Empirical analysis of the convergence of Double DQN in relation to reward sparsity
Örebro University, School of Science and Technology. Nexer. (MPI)ORCID iD: 0000-0003-1913-882X
Örebro University, School of Science and Technology.ORCID iD: 0000-0002-0579-7181
Örebro University, School of Science and Technology.ORCID iD: 0000-0002-1470-6288
Örebro University, School of Science and Technology.ORCID iD: 0000-0002-3122-693X
2022 (English)In: 21st IEEE International Conference on Machine Learning and Applications. ICMLA 2022: Proceedings / [ed] Wani, MA; Kantardzic, M; Palade, V; Neagu, D; Yang, L; Chan, KY, IEEE, 2022, p. 591-596Conference paper, Published paper (Refereed)
Abstract [en]

Q-Networks are used in Reinforcement Learning to model the expected return from every action at a given state. When training Q-Networks, external reward signals are propagated to the previously performed actions leading up to each reward. If many actions are required before experiencing a reward, the reward signal is distributed across all those actions, where some actions may have greater impact on the reward than others. As the number of significant actions between rewards increases, the relative importance of each action decreases. If actions have too small importance, their impact might be over-shadowed by noise in a deep neural network model, potentially causing convergence issues. In this work, we empirically test the limits of increasing the number of actions leading up to a reward in a simple grid-world environment. We show in our experiments that even though the training error surpasses the reward signal attributed to each action, the model is still able to learn a smooth enough value representation.

Place, publisher, year, edition, pages
IEEE, 2022. p. 591-596
Keywords [en]
reinforcement learning, deep q-learning, reward sparsity
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
URN: urn:nbn:se:oru:diva-102850DOI: 10.1109/ICMLA55696.2022.00102ISI: 000980994900087Scopus ID: 2-s2.0-85152213586ISBN: 9781665462839 (electronic)ISBN: 9781665462846 (print)OAI: oai:DiVA.org:oru-102850DiVA, id: diva2:1721654
Conference
21st IEEE International Conference on Machine Learning and Applications (IEEE ICMLA), Nassau, Bahamas, December 12-14, 2022
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Knowledge Foundation, 20190128Knut and Alice Wallenberg FoundationAvailable from: 2022-12-22 Created: 2022-12-22 Last updated: 2023-08-21Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Blad, SamuelLängkvist, MartinKlügl, FranziskaLoutfi, Amy

Search in DiVA

By author/editor
Blad, SamuelLängkvist, MartinKlügl, FranziskaLoutfi, Amy
By organisation
School of Science and Technology
Other Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 364 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf