To Örebro University

oru.seÖrebro University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Neurosymbolic Decision-Making with Large Language Models
Örebro University, School of Science and Technology.ORCID iD: 0000-0003-3422-2085
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Reasoning and decision-making are foundational challenges in artificial intelligence (AI). These processes are closely linked – an intelligent agent must reason about its environment and goals in order to make decisions and select actions. Two principal frameworks for sequential decision-making are AI planning and reinforcement learning (RL). Planning assumes access to a known model of the environment and uses symbolic representations to compute a sequence of actions that leads from an initial state to a desired goal. In contrast, RL focuse son learning behavior through interaction, enabling agents to develop policies that maximize long-term reward under uncertainty. Despite methodological differences, both approaches aim to generate intelligent, goal-directed action sequences.

The rise of Large Language Models (LLMs) has sparked significant interest in their potential to perform reasoning, planning, and decision-making tasks. Despite their impressive performance in natural language understanding and generalization, there is growing skepticism about whether LLMs genuinely reason or merely leverage statistical correlations. This dissertation investigates this question through a principled evaluation grounded in computational theory, using 3-SAT – the canonical NP-complete problem – as a testbed. The findings demonstrate that LLMs fail to exhibit sound and complete reasoning, especially on complex instances where shallow heuristics fail, and that their apparent reasoning abilities often stem from overfitting to statistical patterns.

To address these limitations, this dissertation proposes a range of neurosymbolic architectures that combine the generative flexibility of LLMs with the rigor and reliability of symbolic methods. Empirical evaluations across planning, reward design, and plan verification tasks show that such integration yields systems that are more robust and accurate. This work advances our theoretical and practical understanding of LLM-based reasoning, provides concrete design principles for neurosymbolic systems, and charts a path toward AI agents that integrate world knowledge with logical precision.

Place, publisher, year, edition, pages
Örebro: Örebro University , 2025. , p. 67
Series
Örebro Studies in Technology, ISSN 1650-8580 ; 106
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:oru:diva-122456ISBN: 9789175296869 (print)OAI: oai:DiVA.org:oru-122456DiVA, id: diva2:1985111
Public defence
2025-10-17, Örebro universitet, Långhuset, Hörsal L2, Fakultetsgatan 1, Örebro, 13:00 (English)
Opponent
Supervisors
Available from: 2025-07-22 Created: 2025-07-22 Last updated: 2025-09-04Bibliographically approved
List of papers
1. Can Large Language Models Reason? A Characterization via 3-SAT
Open this publication in new window or tab >>Can Large Language Models Reason? A Characterization via 3-SAT
2025 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Large Language Models (LLMs) have been touted as AI models possessing advanced reasoning abilities. However, recent works have shown that LLMs often bypass true reasoning using shortcuts, sparking skepticism. To study the reasoning capabilities in a principled fashion, we adopt a computational theory perspective and propose an experimental protocol centered on 3-SAT – the prototypical NP-complete problem lying at the core of logical reasoning and constraint satisfaction tasks. Specifically, we examine the phase transitions in random 3-SAT and characterize the reasoning abilities of LLMs by varying the inherent hardness of the problem instances. Our experimental evidence shows that LLMs are incapable of performing true reasoning, as required for solving 3-SAT problems. Moreover, we observe significant performance variation based on the inherent hardness of the problems – performing poorly on harder instances and vice versa. Importantly ,we show that integrating external reasoners can considerably enhance LLM performance. By following a principled experimental protocol, our study draws concrete conclusions and moves beyond the anecdotal evidence often found in LLM reasoning research.

National Category
Computer Sciences
Identifiers
urn:nbn:se:oru:diva-123280 (URN)10.48550/arXiv.2408.07215 (DOI)
Conference
13th International Conference on Learning Representations (ICLR 2025), Singapore, April 24-28, 2025
Note

Published at ICLR 2025 Workshop on Reasoning and Planning for LLMs

Available from: 2025-09-01 Created: 2025-09-01 Last updated: 2025-09-01Bibliographically approved
2. SayCanPay: Heuristic Planning with Large Language Models Using Learnable Domain Knowledge
Open this publication in new window or tab >>SayCanPay: Heuristic Planning with Large Language Models Using Learnable Domain Knowledge
2024 (English)In: Proceedings of the 38th AAAI Conference on Artificial Intelligence / [ed] Michael Wooldridge; Jennifer Dy; Sriraam Natarajan, AAAI Press, 2024, Vol. 38, p. 20123-20133Conference paper, Published paper (Refereed)
Abstract [en]

Large Language Models (LLMs) have demonstrated impressive planning abilities due to their vast "world knowledge". Yet, obtaining plans that are both feasible (grounded in affordances) and cost-effective (in plan length), remains a challenge, despite recent progress. This contrasts with heuristic planning methods that employ domain knowledge (formalized in action models such as PDDL) and heuristic search to generate feasible, optimal plans. Inspired by this, we propose to combine the power of LLMs and heuristic planning by leveraging the world knowledge of LLMs and the principles of heuristic search. Our approach, SayCanPay, employs LLMs to generate actions (Say) guided by learnable domain knowledge, that evaluates actions' feasibility (Can) and long-term reward/payoff (Pay), and heuristic search to select the best sequence of actions. Our contributions are (1) a novel framing of the LLM planning problem in the context of heuristic planning, (2) integrating grounding and cost-effective elements into the generated plans, and (3) using heuristic search over actions. Our extensive evaluations show that our model surpasses other LLM planning approaches.

Place, publisher, year, edition, pages
AAAI Press, 2024
Series
Proceedings of the AAAI Conference on Artificial Intelligence, ISSN 2159-5399, E-ISSN 2374-3468 ; 38:18
National Category
Computer Sciences
Identifiers
urn:nbn:se:oru:diva-115501 (URN)10.1609/aaai.v38i18.29991 (DOI)001241509500037 ()2-s2.0-85189544071 (Scopus ID)9781577358879 (ISBN)
Conference
38th AAAI Conference on Artificial Intelligence (AAAI) / 36th Conference on Innovative Applications of Artificial Intelligence / 14th Symposium on Educational Advances in Artificial Intelligence, Vancouver, Canada, February 20-27, 2024
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Knut and Alice Wallenberg FoundationEU, Horizon 2020, 952215
Note

This work was supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation, and is also part of the EU H2020 ICT48 project “TAILOR” under contract 952215, and the KU Leuven Research Fund (C14/18/062).

Available from: 2024-08-21 Created: 2024-08-21 Last updated: 2025-09-01Bibliographically approved
3. REvolve: Reward Evolution with Large Language Models using Human Feedback
Open this publication in new window or tab >>REvolve: Reward Evolution with Large Language Models using Human Feedback
Show others...
2025 (English)In: 13th International Conference on Learning Representations (ICLR 2025): Proceedings, International Conference on Learning Representations, ICLR , 2025, p. 25710-25751Conference paper, Published paper (Refereed)
Abstract [en]

Designing effective reward functions is crucial to training reinforcement learning (RL) algorithms. However, this design is non-trivial, even for domain experts, due to the subjective nature of certain tasks that are hard to quantify explicitly. In recent works, large language models (LLMs) have been used for reward generation from natural language task descriptions, leveraging their extensive instruction tuning and commonsense understanding of human behavior. In this work, we hypothesize that LLMs, guided by human feedback, can be used to formulate reward functions that reflect human implicit knowledge. We study this in three challenging settings - autonomous driving, humanoid locomotion, and dexterous manipulation - wherein notions of “good” behavior are tacit and hard to quantify. To this end, we introduce REvolve, a truly evolutionary framework that uses LLMs for reward design in RL. REvolve generates and refines reward functions by utilizing human feedback to guide the evolution process, effectively translating implicit human knowledge into explicit reward functions for training (deep) RL agents. Experimentally, we demonstrate that agents trained on REvolve-designed rewards outperform other state-of-the-art baselines. 

Place, publisher, year, edition, pages
International Conference on Learning Representations, ICLR, 2025
National Category
Computer Sciences
Identifiers
urn:nbn:se:oru:diva-123277 (URN)10.48550/arXiv.2406.01309 (DOI)2-s2.0-105010222426 (Scopus ID)9798331320850 (ISBN)
Conference
13th International Conference on Learning Representations (ICLR 2025), Singapore, April 24-28, 2025
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Knut and Alice Wallenberg Foundation
Available from: 2025-09-01 Created: 2025-09-01 Last updated: 2026-01-16Bibliographically approved
4. EgoTV: Egocentric Task Verification from Natural Language Task Descriptions
Open this publication in new window or tab >>EgoTV: Egocentric Task Verification from Natural Language Task Descriptions
Show others...
2023 (English)In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV): Proceedings, IEEE, 2023, p. 15371-15383Conference paper, Published paper (Refereed)
Abstract [en]

To enable progress towards egocentric agents capable of understanding everyday tasks specified in natural language, we propose a benchmark and a synthetic dataset called Egocentric Task Verification (EgoTV). The goal in EgoTV is to verify the execution of tasks from egocentric videos based on the natural language description of these tasks. EgoTV contains pairs of videos and their task descriptions for multi-step tasks -- these tasks contain multiple sub-task decompositions, state changes, object interactions, and sub-task ordering constraints. In addition, EgoTV also provides abstracted task descriptions that contain only partial details about ways to accomplish a task. Consequently, EgoTV requires causal, temporal, and compositional reasoning of video and language modalities, which is missing in existing datasets. We also find that existing vision-language models struggle at such all-round reasoning needed for task verification in EgoTV Inspired by the needs of EgoTV, we propose a novel Neuro-Symbolic Grounding (NSG) approach that leverages symbolic representations to capture the compositional and temporal structure of tasks. We demonstrate NSG's capability towards task tracking and verification on our EgoTV dataset and a real-world dataset derived from CrossTask (CTV). We open-source the EgoTV and CTV datasets and the NSG model for future research on egocentric assistive agents. 

Place, publisher, year, edition, pages
IEEE, 2023
Series
IEEE International Conference on Computer Vision (ICCV), ISSN 1550-5499, E-ISSN 2380-7504
Keywords
Video Task Verification, Computer Vision, Language Understanding, Neuro-Symbolic Reasoning
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:oru:diva-108102 (URN)10.1109/ICCV51070.2023.01414 (DOI)001169499007076 ()2-s2.0-85180427181 (Scopus ID)9798350307184 (ISBN)9798350307191 (ISBN)
Conference
International Conference on Computer Vision (ICCV 2023), Paris, France, October 2-6, 2023
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2023-09-05 Created: 2023-09-05 Last updated: 2025-09-01
5. Deep Explainable Relational Reinforcement Learning: A Neuro-Symbolic Approach
Open this publication in new window or tab >>Deep Explainable Relational Reinforcement Learning: A Neuro-Symbolic Approach
2023 (English)In: Machine Learning and Knowledge Discovery in Databases: Research Track: European Conference, ECML PKDD 2023, Turin, Italy, September 18–22, 2023, Proceedings, Part IV / [ed] Danai Koutra; Claudia Plant; Manuel Gomez Rodriguez; Elena Baralis; Francesco Bonchi, Springer, 2023, Vol. 14172, p. 213-229Conference paper, Published paper (Refereed)
Abstract [en]

Despite its successes, Deep Reinforcement Learning (DRL) yields non-interpretable policies. Moreover, since DRL does not exploit symbolic relational representations, it has difficulties in coping with structural changes in its environment (such as increasing the number of objects).  Meanwhile, Relational Reinforcement Learning inherits the relational representations from symbolic planning to learn reusable policies. However, it has so far been unable to scale up and exploit the power of deep neural networks. We propose Deep Explainable Relational Reinforcement Learning (DERRL), a framework that exploits the best of both -- neural and symbolic worlds. By resorting to a neuro-symbolic approach, DERRL combines relational representations and constraints from symbolic planning with deep learning to extract interpretable policies. These policies are in the form of logical rules that explain why each decision (or action) is arrived at. Through several experiments, in setups like the Countdown Game, Blocks World, Gridworld, Traffic, and Mingrid, we show that the policies learned by DERRL are adaptable to varying configurations and environmental changes.

Place, publisher, year, edition, pages
Springer, 2023
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 14172
Keywords
Neuro-Symbolic AI, Relational Reinforcement Learning, Deep Reinforcement Learning, Explainability
National Category
Computer Sciences
Research subject
Computer and Systems Science; Computer Science
Identifiers
urn:nbn:se:oru:diva-108100 (URN)10.48550/arXiv.2304.08349 (DOI)001156141200013 ()9783031434204 (ISBN)9783031434211 (ISBN)
Conference
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2023), Turin, Italy, September 18-22, 2023
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2023-09-05 Created: 2023-09-05 Last updated: 2025-09-01Bibliographically approved

Open Access in DiVA

Cover(538 kB)54 downloads
File information
File name COVER01.pdfFile size 538 kBChecksum SHA-512
036c85a8283a9231068b0dc28f8469a450005ec6bbe3c8375ece3c9e1b2691a1e4edad4d7751f1543ce606abbc2372323e3085386587939a0050d0f78f851704
Type coverMimetype application/pdf
Neurosymbolic Decision-Making with Large Language Models(4400 kB)345 downloads
File information
File name FULLTEXT01.pdfFile size 4400 kBChecksum SHA-512
540c6c62cf16f0ec3859fe026fa0e78ae4460a47dcf984262af798f81de820325408b2aee4b72ff372335d8b1a27656cdd17a7046b2910828c908cf951db3a0a
Type fulltextMimetype application/pdf
Spikblad(164 kB)42 downloads
File information
File name SPIKBLAD01.pdfFile size 164 kBChecksum SHA-512
e19f46dfeba010ded12aa8148787fec1927eb7873a8e7e4a43261df01521c72d5e81e308a354e06d53222fa1a032f5fba5289bc4f993f6fcc238e6c067c6104a
Type spikbladMimetype application/pdf

Authority records

Hazra, Rishi

Search in DiVA

By author/editor
Hazra, Rishi
By organisation
School of Science and Technology
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 346 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 5073 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf