Robots and other intelligent systems operating in complex, dynamic environments must anticipate the current and future intentions, activities, and actions of surrounding agents to navigate efficiently and avoid collisions. Since the agents’ motion trajectories can represent such intentions, trajectory prediction becomes critical in complex, dynamic environments. However, accurately predicting human motion remains challenging due to the multitude of environmental and agent-specific contextual factors that shape trajectory patterns. These include semantic information about the scene, agent roles or tasks, social interactions, physical constraints, and latent behavioral intentions. The complexinter play of these elements leads to heterogeneous trajectories characterized by variability in speed, direction, intent, and interaction patterns. Despite this, many state-of-the-art trajectory prediction models rely on simplifying assumptions, for instance, the absence of stopping behaviors or exclusively social navigation settings (i.e., multiple agents interacting) and training on homogeneous datasets with limited motion diversity. These limitations hinder their performance in more complex, real-world environments.
A key but underexplored source of trajectory heterogeneity lies in what we suggest referring to as trajectory classes: groupings of data samples sharing similar characteristics. These may be based on observable semantic attributes (e.g., agent type, activity, role) or data-driven latent features learned from the trajectory data itself. While observable classes can be inferred through visual perception systems, data-driven classes require learning directly from trajectory data. Both types can capture important motion diversity and enhance prediction accuracy when integrated effectively into predictors. Despite their relevance, existing work on trajectory classes lacks both dedicated datasets capturing heterogeneous motion patterns and methodological approaches addressing such heterogeneity. This thesis addresses the described gaps by systematically studying the phenomenon of heterogeneity in human motion, analyzing its sources, proposing methods to collect heterogeneous trajectory data, and incorporate trajectory classes (observable and data-driven) into trajectory prediction frameworks.
To answer the first research question – what types of datasets are needed to study trajectory classes and how they should be collected – we introduce THÖR-MAGNI, a large-scale dataset recorded in a mock industrial environment. The dataset captures a wide range of agent activities and roles (e.g., box carriers, groups of people), which can be seen as observable classes, and provides detailed annotations for analyzing these classes in the context of human trajectory prediction. Its complexity and diversity also make it well-suited for learning and analyzing data-driven trajectory classes. We leverage THÖRMAGNI to study the influence of trajectory classes, reflecting underlying human activities or roles in industrial contexts, on trajectory prediction.
This thesis primarily investigates both observable and data-driven trajectory classes as mechanisms to improve prediction accuracy. For observable classes, we ask: How can observable classes be leveraged to enhance trajectory prediction? We extend deep learning models to explicitly and efficiently incorporate observable classes. We evaluate their performance on THÖR-MAGNI and a state-of-the-art imbalanced outdoor dataset. Unlike previous approaches, our models do not require class-specific modules, making them inherently more scalable and memory-efficient. We also demonstrate that pattern-based approaches, such as Maps of Dynamics, outperform deep learning models in low-data and class-imbalanced regimes, which are present in robotics, particularly in cold-start settings where robots operate with minimal prior knowledge. However, observable classes can be ambiguous due to their static assignment of a single label to all trajectories of a given agent. This assumption is sometimes disregarded in real-world settings, where a single agent may perform diverse behavioral patterns. To address this limitation, we extend the THÖRMAGNI dataset with fine-grained, frame-level action annotations, resulting in THÖR-MAGNI Act. By leveraging this enriched dataset, we demonstrate that frame-based action labels provide strong contextual cues. When integrated through direct conditioning or multi-task learning frameworks that jointly model trajectories and action sequences, actions help disambiguate static class assumptions and improve prediction accuracy. In particular, augmenting the state representation with frame-level action signals mitigates the limitations of static observable classes by capturing intra-agent behavioral variability.
For data-driven classes (those not directly observable), we first investigate how to learn them effectively from trajectory data in the context of prediction tasks. To this end, we propose a novel deep generative framework inspired by self-conditioning techniques from image modeling. Our Self-Conditioned generative model learns trajectory clusters that are intrinsically linked to the generative process itself, allowing these clusters to hold privileged information to guide and enhance the training of downstream predictors. Unlike traditional clustering methods, which often fail to capture minority patterns, our approach more effectively identifies less dominant classes, such as the stopping behavior, improving prediction accuracy across underrepresented trajectory modes. We further integrate these learned classes into a multi-stage prediction framework, where the trajectory classes explicitly condition generative models, leading to more accurate and probabilistically informed predictions.
In summary, this thesis provides a comprehensive investigation of the phenomenon of heterogeneity in human trajectory data. It presents methods to analyze natural motion variability, identify meaningful trajectory clusters, quantify their influence on prediction accuracy, and develop mechanisms to integrate this information into deep learning-based predictors. Together, these contributions support more accurate, robust, and context-aware prediction methods for robotics and intelligent systems operating in dynamic human environments.
Örebro: Örebro University , 2025. , p. 175
2025-09-26, Örebro universitet, Långhuset, Hörsal L2, Fakultetsgatan 1, Örebro, 13:00 (English)