Visual Noun Modifiers: The Problem of Binding Visual and Linguistic Cues
2024 (English)In: 2024 IEEE International Conference on Robotics and Automation (ICRA), Institute of Electrical and Electronics Engineers Inc. , 2024, p. 11178-11185Conference paper, Published paper (Refereed)
Abstract [en]
In many robotic applications, especially those involving humans and the environment, linguistic and visual information must be processed jointly and bound together. Existing works either encode the image or the language into a subsymbolic space, like the CLIP model, or create a symbolic space of extracted information, like the object detection models. In this paper, we propose to describe images by nouns and modifiers and introduce a new embedded binding space where the linguistic and visual cues can effectively be bound. We investigate how state-of-the-art models perform in recognizing nouns and modifiers from images, and propose our method by introducing a dataset and CLIP-like recognition techniques based on transfer learning and metric learning. We show real-world experiments that demonstrate the practical applicability of our approach to robotics applications. Our results indicate that our method can surpass the state-of-the-art in recognizing nouns and modifiers from images. Interestingly, our method exhibits a language characteristic related to context sensitivity.
Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc. , 2024. p. 11178-11185
Keywords [en]
Adversarial machine learning, Contrastive Learning, Image coding, Object detection, Object recognition, Robot learning, Transfer learning, Visual languages, ART model, Detection models, Environment information, Linguistic information, Objects detection, Robotics applications, State of the art, Sub-symbolic, Visual cues, Visual information, Linguistics
National Category
Robotics and automation
Identifiers
URN: urn:nbn:se:oru:diva-118593DOI: 10.1109/ICRA57147.2024.10611332ISI: 001369728001128Scopus ID: 2-s2.0-85202445079ISBN: 9798350384574 (electronic)OAI: oai:DiVA.org:oru-118593DiVA, id: diva2:1928316
Conference
IEEE International Conference on Robotics and Automation, ICRA 2024, Yokohama, May 13-17, 2024
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)EU, Horizon 2020, 101016442
Note
This work has been partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation, and has also been supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 101016442 (AIPlan4EU).
2025-01-162025-01-162025-09-08Bibliographically approved