High-level semantic interpretation of (dynamic) visual imagery calls for general and systematic methods integrating techniques in knowledge representation and computer vision. Towards this, we position "deep semantics", denoting the existence of declarative models –e.g., pertaining "space and motion"– and corresponding formalisation and methods supporting (domain-independent) explainability capabilities such as semantic question-answering, relational (and relationally-driven) visuospatial learning, and (non-monotonic) visuospatial abduction. Rooted in recent work, we summarise and report the status quo on deep visuospatial semantics —and our approach to neurosymbolic integration and explainable visuo-spatial computing in that context— with developed methods and tools in diverse settings such as behavioural research in psychology, art & social sciences, and autonomous driving.