To Örebro University

oru.seÖrebro University Publications
System disruptions
We are currently experiencing disruptions on the search portals due to high traffic. We are working to resolve the issue, you may temporarily encounter an error message.
Change search
Link to record
Permanent link

Direct link
Publications (4 of 4) Show all publications
Rietz, F., Schaffernicht, E., Heinrich, S. & Stork, J. A. (2024). Prioritized soft q-decomposition for lexicographic reinforcement learning. In: 12th International Conference on Learning Representations, ICLR 2024: . Paper presented at 12th International Conference on Learning Representations, ICLR 2024, Vienna, May 7-11, 2024. International Conference on Learning Representations, ICLR
Open this publication in new window or tab >>Prioritized soft q-decomposition for lexicographic reinforcement learning
2024 (English)In: 12th International Conference on Learning Representations, ICLR 2024, International Conference on Learning Representations, ICLR , 2024Conference paper, Published paper (Refereed)
Abstract [en]

Reinforcement learning (RL) for complex tasks remains a challenge, primarily due to the difficulties of engineering scalar reward functions and the inherent inefficiency of training models from scratch. Instead, it would be better to specify complex tasks in terms of elementary subtasks and to reuse subtask solutions whenever possible. In this work, we address continuous space lexicographic multi-objective RL problems, consisting of prioritized subtasks, which are notoriously difficult to solve. We show that these can be scalarized with a subtask transformation and then solved incrementally using value decomposition. Exploiting this insight, we propose prioritized soft Q-decomposition (PSQD), a novel algorithm for learning and adapting subtask solutions under lexicographic priorities in continuous state-action spaces. PSQD offers the ability to reuse previously learned subtask solutions in a zero-shot composition, followed by an adaptation step. Its ability to use retained subtask training data for offline learning eliminates the need for new environment interaction during adaptation. We demonstrate the efficacy of our approach by presenting successful learning, reuse, and adaptation results for both low- and high-dimensional simulated robot control tasks, as well as offline learning results. In contrast to baseline approaches, PSQD does not trade off between conflicting subtasks or priority constraints and satisfies subtask priorities during learning. PSQD provides an intuitive framework for tackling complex RL problems, offering insights into the inner workings of the subtask composition. © 2024 12th International Conference on Learning Representations, ICLR 2024. All rights reserved.

Place, publisher, year, edition, pages
International Conference on Learning Representations, ICLR, 2024
Keywords
Economic and social effects, Learning algorithms, Zero-shot learning, Complex task, Continuous spaces, Learning problem, Multi objective, Off-line learning, Reinforcement learnings, Reuse, Reward function, Subtask, Training model, Reinforcement learning
National Category
Robotics and automation
Identifiers
urn:nbn:se:oru:diva-118577 (URN)2-s2.0-85200578187 (Scopus ID)
Conference
12th International Conference on Learning Representations, ICLR 2024, Vienna, May 7-11, 2024
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2025-01-16 Created: 2025-01-16 Last updated: 2025-01-16Bibliographically approved
Rietz, F. & Stork, J. A. (2023). Diversity for Contingency: Learning Diverse Behaviors for Efficient Adaptation and Transfer. In: : . Paper presented at 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023), Detroit, MI, USA, October 1-5, 2023.
Open this publication in new window or tab >>Diversity for Contingency: Learning Diverse Behaviors for Efficient Adaptation and Transfer
2023 (English)Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

Discovering all useful solutions for a given task is crucial for transferable RL agents, to account for changes in the task or transition dynamics. This is not considered by classical RL algorithms that are only concerned with finding the optimal policy, given the current task and dynamics. We propose a simple method for discovering all possible solutions of a given task, to obtain an agent that performs well in the transfer setting and adapts quickly to changes in the task or transition dynamics. Our method iteratively learns a set of policies, while each subsequent policy is constrained to yield a solution that is unlikely under all previous policies. Unlike prior methods, our approach does not require learning additional models for novelty detection and avoids balancing task and novelty reward signals, by directly incorporating the constraint into the action selection and optimization steps. 

National Category
Computer Sciences
Identifiers
urn:nbn:se:oru:diva-112199 (URN)10.48550/arXiv.2310.07493 (DOI)
Conference
2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023), Detroit, MI, USA, October 1-5, 2023
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

Presented at the third RL-Conform workshop at IROS 2023.

Available from: 2024-03-07 Created: 2024-03-07 Last updated: 2024-03-11Bibliographically approved
Rietz, F., Magg, S., Heintz, F., Stoyanov, T., Wermter, S. & Stork, J. A. (2023). Hierarchical goals contextualize local reward decomposition explanations. Neural Computing & Applications, 35(23), 16693-16704
Open this publication in new window or tab >>Hierarchical goals contextualize local reward decomposition explanations
Show others...
2023 (English)In: Neural Computing & Applications, ISSN 0941-0643, E-ISSN 1433-3058, Vol. 35, no 23, p. 16693-16704Article in journal (Refereed) Published
Abstract [en]

One-step reinforcement learning explanation methods account for individual actions but fail to consider the agent's future behavior, which can make their interpretation ambiguous. We propose to address this limitation by providing hierarchical goals as context for one-step explanations. By considering the current hierarchical goal as a context, one-step explanations can be interpreted with higher certainty, as the agent's future behavior is more predictable. We combine reward decomposition with hierarchical reinforcement learning into a novel explainable reinforcement learning framework, which yields more interpretable, goal-contextualized one-step explanations. With a qualitative analysis of one-step reward decomposition explanations, we first show that their interpretability is indeed limited in scenarios with multiple, different optimal policies-a characteristic shared by other one-step explanation methods. Then, we show that our framework retains high interpretability in such cases, as the hierarchical goal can be considered as context for the explanation. To the best of our knowledge, our work is the first to investigate hierarchical goals not as an explanation directly but as additional context for one-step reinforcement learning explanations.

Place, publisher, year, edition, pages
Springer, 2023
Keywords
Reinforcement learning, Explainable AI, Reward decomposition, Hierarchical goals, Local explanations
National Category
Computer Sciences
Identifiers
urn:nbn:se:oru:diva-99115 (URN)10.1007/s00521-022-07280-8 (DOI)000794083400001 ()2-s2.0-85129803505 (Scopus ID)
Note

Funding agencies:

Örebro University

Wallenberg AI, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation

Federal Ministry for Economic Affairs and Climate FKZ 20X1905A-D

Available from: 2022-05-23 Created: 2022-05-23 Last updated: 2023-11-28Bibliographically approved
Rietz, F., Schaffernicht, E., Stoyanov, T. & Stork, J. A. (2022). Towards Task-Prioritized Policy Composition. In: : . Paper presented at 35th IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022), Kyoto, Japan, October 24-26, 2022.
Open this publication in new window or tab >>Towards Task-Prioritized Policy Composition
2022 (English)Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

Combining learned policies in a prioritized, ordered manner is desirable because it allows for modular design and facilitates data reuse through knowledge transfer. In control theory, prioritized composition is realized by null-space control, where low-priority control actions are projected into the null-space of high-priority control actions. Such a method is currently unavailable for Reinforcement Learning. We propose a novel, task-prioritized composition framework for Reinforcement Learning, which involves a novel concept: The indifferent-space of Reinforcement Learning policies. Our framework has the potential to facilitate knowledge transfer and modular design while greatly increasing data efficiency and data reuse for Reinforcement Learning agents. Further, our approach can ensure high-priority constraint satisfaction, which makes it promising for learning in safety-critical domains like robotics. Unlike null-space control, our approach allows learning globally optimal policies for the compound task by online learning in the indifference-space of higher-level policies after initial compound policy construction. 

National Category
Computer Systems
Identifiers
urn:nbn:se:oru:diva-102120 (URN)
Conference
35th IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022), Kyoto, Japan, October 24-26, 2022
Available from: 2022-11-08 Created: 2022-11-08 Last updated: 2024-01-03Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-8151-4692

Search in DiVA

Show all publications