We present a computational framework for the grounding and semantic interpretation of dynamic visuo-spatial imagery consisting of video and eyetracking data. Driven by cognitive film studies and visual perception research, we demonstrate key technological capabilities aimed at investigating attention and recipient effects vis-a-vis the motion picture; this encompasses high-level analysis of subject's visual fixation patterns and correlating this with (deep) semantic analysis of the dynamic visual data (e.g., fixation on movie characters, influence of cinematographic devices such as cuts). The framework and its application as a general AI-based assistive technology platform -integrating vision and KR- for cognitive film studies is highlighted.