Till Örebro universitet

oru.seÖrebro universitets publikationer
Ändra sökning
Länk till posten
Permanent länk

Direktlänk
Publikationer (4 of 4) Visa alla publikationer
Rietz, F., Schaffernicht, E., Heinrich, S. & Stork, J. A. (2024). Prioritized soft q-decomposition for lexicographic reinforcement learning. In: 12th International Conference on Learning Representations, ICLR 2024: . Paper presented at 12th International Conference on Learning Representations, ICLR 2024, Vienna, May 7-11, 2024. International Conference on Learning Representations, ICLR
Öppna denna publikation i ny flik eller fönster >>Prioritized soft q-decomposition for lexicographic reinforcement learning
2024 (Engelska)Ingår i: 12th International Conference on Learning Representations, ICLR 2024, International Conference on Learning Representations, ICLR , 2024Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Reinforcement learning (RL) for complex tasks remains a challenge, primarily due to the difficulties of engineering scalar reward functions and the inherent inefficiency of training models from scratch. Instead, it would be better to specify complex tasks in terms of elementary subtasks and to reuse subtask solutions whenever possible. In this work, we address continuous space lexicographic multi-objective RL problems, consisting of prioritized subtasks, which are notoriously difficult to solve. We show that these can be scalarized with a subtask transformation and then solved incrementally using value decomposition. Exploiting this insight, we propose prioritized soft Q-decomposition (PSQD), a novel algorithm for learning and adapting subtask solutions under lexicographic priorities in continuous state-action spaces. PSQD offers the ability to reuse previously learned subtask solutions in a zero-shot composition, followed by an adaptation step. Its ability to use retained subtask training data for offline learning eliminates the need for new environment interaction during adaptation. We demonstrate the efficacy of our approach by presenting successful learning, reuse, and adaptation results for both low- and high-dimensional simulated robot control tasks, as well as offline learning results. In contrast to baseline approaches, PSQD does not trade off between conflicting subtasks or priority constraints and satisfies subtask priorities during learning. PSQD provides an intuitive framework for tackling complex RL problems, offering insights into the inner workings of the subtask composition. © 2024 12th International Conference on Learning Representations, ICLR 2024. All rights reserved.

Ort, förlag, år, upplaga, sidor
International Conference on Learning Representations, ICLR, 2024
Nyckelord
Economic and social effects, Learning algorithms, Zero-shot learning, Complex task, Continuous spaces, Learning problem, Multi objective, Off-line learning, Reinforcement learnings, Reuse, Reward function, Subtask, Training model, Reinforcement learning
Nationell ämneskategori
Robotik och automation
Identifikatorer
urn:nbn:se:oru:diva-118577 (URN)2-s2.0-85200578187 (Scopus ID)
Konferens
12th International Conference on Learning Representations, ICLR 2024, Vienna, May 7-11, 2024
Forskningsfinansiär
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Tillgänglig från: 2025-01-16 Skapad: 2025-01-16 Senast uppdaterad: 2025-01-16Bibliografiskt granskad
Rietz, F. & Stork, J. A. (2023). Diversity for Contingency: Learning Diverse Behaviors for Efficient Adaptation and Transfer. In: : . Paper presented at 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023), Detroit, MI, USA, October 1-5, 2023.
Öppna denna publikation i ny flik eller fönster >>Diversity for Contingency: Learning Diverse Behaviors for Efficient Adaptation and Transfer
2023 (Engelska)Konferensbidrag, Muntlig presentation med publicerat abstract (Refereegranskat)
Abstract [en]

Discovering all useful solutions for a given task is crucial for transferable RL agents, to account for changes in the task or transition dynamics. This is not considered by classical RL algorithms that are only concerned with finding the optimal policy, given the current task and dynamics. We propose a simple method for discovering all possible solutions of a given task, to obtain an agent that performs well in the transfer setting and adapts quickly to changes in the task or transition dynamics. Our method iteratively learns a set of policies, while each subsequent policy is constrained to yield a solution that is unlikely under all previous policies. Unlike prior methods, our approach does not require learning additional models for novelty detection and avoids balancing task and novelty reward signals, by directly incorporating the constraint into the action selection and optimization steps. 

Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:oru:diva-112199 (URN)10.48550/arXiv.2310.07493 (DOI)
Konferens
2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023), Detroit, MI, USA, October 1-5, 2023
Forskningsfinansiär
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Anmärkning

Presented at the third RL-Conform workshop at IROS 2023.

Tillgänglig från: 2024-03-07 Skapad: 2024-03-07 Senast uppdaterad: 2024-03-11Bibliografiskt granskad
Rietz, F., Magg, S., Heintz, F., Stoyanov, T., Wermter, S. & Stork, J. A. (2023). Hierarchical goals contextualize local reward decomposition explanations. Neural Computing & Applications, 35(23), 16693-16704
Öppna denna publikation i ny flik eller fönster >>Hierarchical goals contextualize local reward decomposition explanations
Visa övriga...
2023 (Engelska)Ingår i: Neural Computing & Applications, ISSN 0941-0643, E-ISSN 1433-3058, Vol. 35, nr 23, s. 16693-16704Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

One-step reinforcement learning explanation methods account for individual actions but fail to consider the agent's future behavior, which can make their interpretation ambiguous. We propose to address this limitation by providing hierarchical goals as context for one-step explanations. By considering the current hierarchical goal as a context, one-step explanations can be interpreted with higher certainty, as the agent's future behavior is more predictable. We combine reward decomposition with hierarchical reinforcement learning into a novel explainable reinforcement learning framework, which yields more interpretable, goal-contextualized one-step explanations. With a qualitative analysis of one-step reward decomposition explanations, we first show that their interpretability is indeed limited in scenarios with multiple, different optimal policies-a characteristic shared by other one-step explanation methods. Then, we show that our framework retains high interpretability in such cases, as the hierarchical goal can be considered as context for the explanation. To the best of our knowledge, our work is the first to investigate hierarchical goals not as an explanation directly but as additional context for one-step reinforcement learning explanations.

Ort, förlag, år, upplaga, sidor
Springer, 2023
Nyckelord
Reinforcement learning, Explainable AI, Reward decomposition, Hierarchical goals, Local explanations
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:oru:diva-99115 (URN)10.1007/s00521-022-07280-8 (DOI)000794083400001 ()2-s2.0-85129803505 (Scopus ID)
Anmärkning

Funding agencies:

Örebro University

Wallenberg AI, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation

Federal Ministry for Economic Affairs and Climate FKZ 20X1905A-D

Tillgänglig från: 2022-05-23 Skapad: 2022-05-23 Senast uppdaterad: 2023-11-28Bibliografiskt granskad
Rietz, F., Schaffernicht, E., Stoyanov, T. & Stork, J. A. (2022). Towards Task-Prioritized Policy Composition. In: : . Paper presented at 35th IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022), Kyoto, Japan, October 24-26, 2022.
Öppna denna publikation i ny flik eller fönster >>Towards Task-Prioritized Policy Composition
2022 (Engelska)Konferensbidrag, Muntlig presentation med publicerat abstract (Refereegranskat)
Abstract [en]

Combining learned policies in a prioritized, ordered manner is desirable because it allows for modular design and facilitates data reuse through knowledge transfer. In control theory, prioritized composition is realized by null-space control, where low-priority control actions are projected into the null-space of high-priority control actions. Such a method is currently unavailable for Reinforcement Learning. We propose a novel, task-prioritized composition framework for Reinforcement Learning, which involves a novel concept: The indifferent-space of Reinforcement Learning policies. Our framework has the potential to facilitate knowledge transfer and modular design while greatly increasing data efficiency and data reuse for Reinforcement Learning agents. Further, our approach can ensure high-priority constraint satisfaction, which makes it promising for learning in safety-critical domains like robotics. Unlike null-space control, our approach allows learning globally optimal policies for the compound task by online learning in the indifference-space of higher-level policies after initial compound policy construction. 

Nationell ämneskategori
Datorsystem
Identifikatorer
urn:nbn:se:oru:diva-102120 (URN)
Konferens
35th IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022), Kyoto, Japan, October 24-26, 2022
Tillgänglig från: 2022-11-08 Skapad: 2022-11-08 Senast uppdaterad: 2024-01-03Bibliografiskt granskad
Organisationer
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0001-8151-4692

Sök vidare i DiVA

Visa alla publikationer