Abstract We present a joint estimation technique of event localization and role assignment when the target video event is described by a scenario. Specififically, to detect multi-agent events from video, our algorithm identififies agents involved in an event and assigns roles to the participating agents. Instead of iterating through all possible agent-role combinations, we formulate the joint optimization problem as two effificient subproblems—quadratic programming for role assignment followed by linear programming for event localization. Additionally, we reduce the computational complexity signifificantly by applying role-specifific event detectors to each agent independently. We test the performance of our algorithm in natural videos, which contain multiple target events and nonparticipating agents.