Abstract We demonstrate the need and potential of systematically integrated vision and semantics solutions for visual sensemaking (in the backdrop of autonomous driving). A general method for online visual sensemaking using answer set programming is systematically formalised and fully implemented. The method integrates state of the art in visual computing, and is developed as a modular framework usable within hybrid architectures for perception & control. We evaluate and demo with community established benchmarks KITTIMOD and MOT. As use-case, we focus on the signifificance of humancentred visual sensemaking —e.g., semantic representation and explainability, question-answering, commonsense interpolation— in safety-critical autonomous driving situations