The increasing complexity and unpredictability of manipulation tasks in modern industrial and service robotics have highlighted the limitations of pre-programmed robot solutions. Robots operating under changing object positions, variable obstacles, and unforeseen perturbations must adjust their actions online to reliably satisfy the high-performance requirements of real-world deployment scenarios.
A promising direction for enabling such flexible and reliable manipulation lies in the use of Behavior Trees (BTs), a formalism for transparent decision-making that structures robot behavior hierarchically through modular, reusable components. BTs are a well-suited solution because their inherent reactivity allows the system to respond effectively to high-level disturbances, such as perception or grasping failures. At the same time, their modular design facilitates the reuse of sub-behaviors across different scenarios, enabling automation systems to be easily reconfigured to meet varying operational demands. However, existing BT-based approaches fall short in scenarios in which more advanced forms of robustness to local perturbations and task variations are required. This thesis contributes novel solutions to address these limitations and enhance the applicability of BTs as control policies in real-world manipulation settings.
To rigorously assess the solutions proposed in this thesis, we first need to formalize the terminology and evaluation criteria associated with BT-based robot control. We begin by identifying a subset of properties that are most relevant to our scope, such as reactivity, modularity, and robustness, and clarifying their definitions by resolving ambiguities found in prior work. For each of these properties, we examine how they have been evaluated in the literature and propose additional metrics to address identified gaps in existing evaluation practices.
The first technical contribution addresses the reactivity of BT policies and their way of handling simultaneous control objectives. While BTs effectivel ymanage global, high-level disturbances, flexible manipulation also requires rapid response to local, low-level perturbations that do not warrant changes to the high-level plan. Furthermore, when BTs are coupled with convention allow-level controllers for redundant manipulators, they often struggle to satisfy multiple, potentially competing objectives in a coherent and reliable manner. To address these limitations, we integrate BTs with a prioritized control strategy that decomposes each manipulation skill, such as grasping, into multiple control objectives with defined priorities, distributed across the BT nodes and executed concurrently. This integration introduces an additional layer of low-level reactivity, ensures the reliable satisfaction of multiple objectives, and reinforces the modularity of the BT policy by assigning distinct goals to separate leaf nodes.
Although the proposed framework provides robustness to both high- and low-level disturbances during execution, it still relies on manually specified parameters, which often need adjustment to specific task variations, such as minor changes of object positions or obstacle configurations. The second technical contribution is a data-driven approach based on Reinforcement Learning that augments the BT with a context-based adaptation policy. This module observes task-relevant features, referred to as a context, and selects appropriate BT parameters at execution time. The result is a policy that adapts its behavior on the fly to previously unseen variations, without manual intervention.
Despite its benefits, the proposed framework remains limited to adapting only to directly observable task variations and requires training procedures that, when performed on a physical robot, are often unsafe and timeconsuming. The last technical contribution addresses both limitations by introducing a context estimator that infers latent dynamics parameters — such as friction coefficients or object mass — from recent interaction data. Conditioning the context-based adaptation policy on this latent estimate enables the BT-based policy to operate robustly even under partial observability. Moreover, because these latent parameters often underlie the discrepancies between simulation and reality, the very same mechanism also provides a principled way to bridge the sim-to-real gap: policies are trained in simulation with domain randomization, while the estimated context compensates for the mismatched dynamics, improving robustness at deployment.
Örebro: Örebro University , 2025. , p. 143
2025-12-09, Örebro universitet, Långhuset, Hörsal L2, Fakultetsgatan 1, Örebro, 09:00 (English)