<br/>These models are described in [this ICCV 2015 paper](http://www.cs.utexas.edu/users/ml/papers/venugopalan.iccv15.pdf). <br/> <br/> Sequence to Sequence - Video to Text<br/> S. Venugopalan, M. Rohrbach, J. Donahue, T. Darrell, R. Mooney, K. Saenko<br/> The IEEE International Conference on Computer Vision (ICCV) 2015<br/> <br/>More details can be found on [this project page](https://vsubhashini.github.io/s2vt.html).<br/> <br/>Model: <br/>[S2VT_VGG_RGB](https://gist.github.com/vsubhashini/38d087e140854fee4b14): <br/>This is the S2VT (RGB) model described in the ICCV 2015 paper. It uses video frame features from the [VGG-16](https://gist.github.com/ksimonyan/211839e770f7b538e2d8#file-readme-md) layer model. This is trained only on the Youtube video dataset.<br/><br/>Compatibility: <br/>These are pre-release models. They do not run in any current version of BVLC/caffe, as they require unmerged PRs. The models are currently supported by the `recurrent` branch of the Caffe fork provided at https://github.com/jeffdonahue/caffe/tree/recurrent and https://github.com/vsubhashini/caffe/tree/recurrent.<br/><br/><br/>
无链接