oru.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Learning Relational Event Models from Video
School of Computing, University of Leeds, Leeds, UK.
School of Computing, University of Leeds, Leeds, UK.
School of Computing, University of Leeds, Leeds, UK.
Cognitive Systems, SFB/TR 8 Spatial Cognition, University of Bremen, Bremen, Germany. (AASS)ORCID iD: 0000-0002-6290-5492
Show others and affiliations
2015 (English)In: The journal of artificial intelligence research, ISSN 1076-9757, E-ISSN 1943-5037, Vol. 53, p. 41-90Article in journal (Refereed) Published
Abstract [en]

Event models obtained automatically from video can be used in applications ranging from abnormal event detection to content based video retrieval. When multiple agents are involved in the events, characterizing events naturally suggests encoding interactions as relations. Learning event models from this kind of relational spatio-temporal data using relational learning techniques such as Inductive Logic Programming (ILP) hold promise, but have not been successfully applied to very large datasets which result from video data. In this paper, we present a novel framework REMIND (Relational Event Model INDuction) for supervised relational learning of event models from large video datasets using ILP. Efficiency is achieved through the learning from interpretations setting and using a typing system that exploits the type hierarchy of objects in a domain. The use of types also helps prevent over generalization. Furthermore, we also present a type-refining operator and prove that it is optimal. The learned models can be used for recognizing events from previously unseen videos. We also present an extension to the framework by integrating an abduction step that improves the learning performance when there is noise in the input data. The experimental results on several hours of video data from two challenging real world domains (an airport domain and a physical action verbs domain) suggest that the techniques are suitable to real world scenarios.

Place, publisher, year, edition, pages
AI Access Foundation , 2015. Vol. 53, p. 41-90
National Category
Computer Sciences Computer Vision and Robotics (Autonomous Systems)
Identifiers
URN: urn:nbn:se:oru:diva-64158DOI: 10.1613/jair.4395ISI: 000365176400002Scopus ID: 2-s2.0-84930960065OAI: oai:DiVA.org:oru-64158DiVA, id: diva2:1174427
Note

Funding Agencies:

EU  FP7-ICT-214975  FP7-ICT-27752  FP7-ICT-600623 

DARPA  W911NF-10-C-0083 

Deutsche Forschungsgemeinschaft  SFB/TR 8 

Available from: 2018-01-15 Created: 2018-01-15 Last updated: 2018-01-18Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records BETA

Bhatt, Mehul

Search in DiVA

By author/editor
Bhatt, Mehul
In the same journal
The journal of artificial intelligence research
Computer SciencesComputer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 57 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf