We present a general computational narrative model encompassing primitives of space, time, and motion from the viewpoint of deep knowledge representation and reasoning about visuo-spatial dynamics, and (eye-tracking based) visual perception of the moving image. The declarative model, implemented within constraint logic programming, integrates knowledge-based qualitative reasoning (e.g., about object / character placement, scene structure) with state of the art computer vision methods for detecting, tracking, and recognition of people, objects, and cinematographic devices such as cuts, shot types, types of camera movement. A key feature is that primitives of the theory - things, time, space and motion predicates, actions and events, perceptual objects (e.g., eye-tracking / gaze points, regions of attention etc) - are available as first-class objects with deep semantics suited for inference and query from the viewpoint of analytical Q&A or studies in visual perception. We present the formal framework and its implementation in the context of a large-scale experiment concerned with analysis of visual perception and reception of the moving image in the context of cognitive film studies.