To Örebro University

oru.seÖrebro University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Visual Noun Modifiers: The Problem of Binding Visual and Linguistic Cues
Örebro University, School of Science and Technology. (Center for Applied Autonomous Sensor Systems (AASS))ORCID iD: 0000-0002-7072-7104
Örebro University, School of Science and Technology. (Center for Applied Autonomous Sensor Systems (AASS))ORCID iD: 0000-0002-2385-9470
Örebro University, School of Science and Technology. (Center for Applied Autonomous Sensor Systems (AASS))ORCID iD: 0000-0001-8229-1363
2024 (English)In: 2024 IEEE International Conference on Robotics and Automation (ICRA), Institute of Electrical and Electronics Engineers Inc. , 2024, p. 11178-11185Conference paper, Published paper (Refereed)
Abstract [en]

In many robotic applications, especially those involving humans and the environment, linguistic and visual information must be processed jointly and bound together. Existing works either encode the image or the language into a subsymbolic space, like the CLIP model, or create a symbolic space of extracted information, like the object detection models. In this paper, we propose to describe images by nouns and modifiers and introduce a new embedded binding space where the linguistic and visual cues can effectively be bound. We investigate how state-of-the-art models perform in recognizing nouns and modifiers from images, and propose our method by introducing a dataset and CLIP-like recognition techniques based on transfer learning and metric learning. We show real-world experiments that demonstrate the practical applicability of our approach to robotics applications. Our results indicate that our method can surpass the state-of-the-art in recognizing nouns and modifiers from images. Interestingly, our method exhibits a language characteristic related to context sensitivity.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc. , 2024. p. 11178-11185
Keywords [en]
Adversarial machine learning, Contrastive Learning, Image coding, Object detection, Object recognition, Robot learning, Transfer learning, Visual languages, ART model, Detection models, Environment information, Linguistic information, Objects detection, Robotics applications, State of the art, Sub-symbolic, Visual cues, Visual information, Linguistics
National Category
Robotics and automation
Identifiers
URN: urn:nbn:se:oru:diva-118593DOI: 10.1109/ICRA57147.2024.10611332ISI: 001369728001128Scopus ID: 2-s2.0-85202445079ISBN: 9798350384574 (electronic)OAI: oai:DiVA.org:oru-118593DiVA, id: diva2:1928316
Conference
IEEE International Conference on Robotics and Automation, ICRA 2024, Yokohama, May 13-17, 2024
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)EU, Horizon 2020, 101016442
Note

This work has been partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation, and has also been supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 101016442 (AIPlan4EU).

Available from: 2025-01-16 Created: 2025-01-16 Last updated: 2025-09-08Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Faridghasemnia, MohamadrezaRenoux, JenniferSaffiotti, Alessandro

Search in DiVA

By author/editor
Faridghasemnia, MohamadrezaRenoux, JenniferSaffiotti, Alessandro
By organisation
School of Science and Technology
Robotics and automation

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 29 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf