To Örebro University

oru.seÖrebro University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Prioritized soft q-decomposition for lexicographic reinforcement learning
Örebro University, School of Science and Technology.ORCID iD: 0000-0001-8151-4692
Örebro University, School of Science and Technology.ORCID iD: 0000-0003-4026-7490
IT University of Copenhagen, Denmark.
Örebro University, School of Science and Technology.ORCID iD: 0000-0003-3958-6179
2024 (English)In: 12th International Conference on Learning Representations, ICLR 2024, International Conference on Learning Representations, ICLR , 2024Conference paper, Published paper (Refereed)
Abstract [en]

Reinforcement learning (RL) for complex tasks remains a challenge, primarily due to the difficulties of engineering scalar reward functions and the inherent inefficiency of training models from scratch. Instead, it would be better to specify complex tasks in terms of elementary subtasks and to reuse subtask solutions whenever possible. In this work, we address continuous space lexicographic multi-objective RL problems, consisting of prioritized subtasks, which are notoriously difficult to solve. We show that these can be scalarized with a subtask transformation and then solved incrementally using value decomposition. Exploiting this insight, we propose prioritized soft Q-decomposition (PSQD), a novel algorithm for learning and adapting subtask solutions under lexicographic priorities in continuous state-action spaces. PSQD offers the ability to reuse previously learned subtask solutions in a zero-shot composition, followed by an adaptation step. Its ability to use retained subtask training data for offline learning eliminates the need for new environment interaction during adaptation. We demonstrate the efficacy of our approach by presenting successful learning, reuse, and adaptation results for both low- and high-dimensional simulated robot control tasks, as well as offline learning results. In contrast to baseline approaches, PSQD does not trade off between conflicting subtasks or priority constraints and satisfies subtask priorities during learning. PSQD provides an intuitive framework for tackling complex RL problems, offering insights into the inner workings of the subtask composition. © 2024 12th International Conference on Learning Representations, ICLR 2024. All rights reserved.

Place, publisher, year, edition, pages
International Conference on Learning Representations, ICLR , 2024.
Keywords [en]
Economic and social effects, Learning algorithms, Zero-shot learning, Complex task, Continuous spaces, Learning problem, Multi objective, Off-line learning, Reinforcement learnings, Reuse, Reward function, Subtask, Training model, Reinforcement learning
National Category
Robotics and automation
Identifiers
URN: urn:nbn:se:oru:diva-118577Scopus ID: 2-s2.0-85200578187OAI: oai:DiVA.org:oru-118577DiVA, id: diva2:1928123
Conference
12th International Conference on Learning Representations, ICLR 2024, Vienna, May 7-11, 2024
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Available from: 2025-01-16 Created: 2025-01-16 Last updated: 2025-01-16Bibliographically approved

Open Access in DiVA

No full text in DiVA

Scopus

Authority records

Rietz, FinnSchaffernicht, ErikStork, Johannes Andreas

Search in DiVA

By author/editor
Rietz, FinnSchaffernicht, ErikStork, Johannes Andreas
By organisation
School of Science and Technology
Robotics and automation

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 17 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf