Grounding Spatial Language for Video Search. Tellex, S., Kollar, T., Shaw, G., Roy, N., & Roy, D.
Grounding Spatial Language for Video Search [pdf]Paper  abstract   bibtex   
The ability to find a video clip that matches a natural lan-guage description of an event would enable intuitive search of large databases of surveillance video. We present a mech-anism for connecting a spatial language query to a video clip corresponding to the query. The system can retrieve video clips matching millions of potential queries that de-scribe complex events in video such as " people walking from the hallway door, around the island, to the kitchen sink. " By breaking down the query into a sequence of independent structured clauses and modeling the meaning of each com-ponent of the structure separately, we are able to improve on previous approaches to video retrieval by finding clips that match much longer and more complex queries using a rich set of spatial relations such as " down " and " past. " We present a rigorous analysis of the system's performance, based on a large corpus of task-constrained language collected from fourteen subjects. Using this corpus, we show that the sys-tem effectively retrieves clips that match natural language descriptions: 58.3% were ranked in the top two of ten in a retrieval task. Furthermore, we show that spatial relations play an important role in the system's performance.

Downloads: 0