To Örebro University

oru.seÖrebro University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially Observable Environments
Australian Institute for Machine Learning, The University of Adelaide, Australia.
Örebro University, School of Science and Technology. (Centre for Applied Autonomous Sensor Systems (AASS))ORCID iD: 0000-0002-4001-2087
Örebro University, School of Science and Technology. Department of Computer Science, KULeuven, Belgium. (Centre for Applied Autonomous Sensor Systems (AASS))ORCID iD: 0000-0002-6860-6303
2024 (English)In: 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings, European Language Resources Association (ELRA) , 2024, p. 3297-3313Conference paper, Published paper (Refereed)
Abstract [en]

The integration of learning and reasoning is high on the research agenda in AI. Nevertheless, there is only a little attention to use existing background knowledge for reasoning about partially observed scenes to answer questions about the scene. Yet, we as humans use such knowledge frequently to infer plausible answers to visual questions (by eliminating all inconsistent ones). Such knowledge often comes in the form of constraints about objects and it tends to be highly domain or environment-specific. We contribute a novel benchmark called CLEVR-POC for reasoning-intensive visual question answering (VQA) in partially observable environments under constraints. In CLEVR-POC, knowledge in the form of logical constraints needs to be leveraged to generate plausible answers to questions about a hidden object in a given partial scene. For instance, if one has the knowledge that all cups are colored either red, green or blue and that there is only one green cup, it becomes possible to deduce the color of an occluded cup as either red or blue, provided that all other cups, including the green one, are observed. Through experiments, we observe that the low performance of pre-trained vision language models like CLIP (≈ 22%) and a large language model (LLM) like GPT-4 (≈ 46%) on CLEVR-POC ascertains the necessity for frameworks that can handle reasoning-intensive tasks where environment-specific background knowledge is available and crucial. Furthermore, our demonstration illustrates that a neuro-symbolic model, which integrates an LLM like GPT-4 with a visual perception network and a formal logical reasoner, exhibits exceptional performance on CLEVR-POC.

Place, publisher, year, edition, pages
European Language Resources Association (ELRA) , 2024. p. 3297-3313
Keywords [en]
LLM and Reasoning, logical constraints, partial observability, visual question answering, Computational linguistics, Visual languages, Background knowledge, Language model, Large language model and reasoning, Partially observable environments, Performance, Question Answering, Research agenda, Knowledge management
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:oru:diva-118582Scopus ID: 2-s2.0-85195916891ISBN: 9782493814104 (print)OAI: oai:DiVA.org:oru-118582DiVA, id: diva2:1928178
Conference
Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024, Torino, Italy, May 20-25, 2024
Available from: 2025-01-16 Created: 2025-01-16 Last updated: 2025-01-16Bibliographically approved

Open Access in DiVA

No full text in DiVA

Scopus

Authority records

Alirezaie, MarjanDe Raedt, Luc

Search in DiVA

By author/editor
Alirezaie, MarjanDe Raedt, Luc
By organisation
School of Science and Technology
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 108 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf