Updated to work with Keras 2.0 and TF 1.2 and Spacy 2.0
This code is meant for education thus focus is on simplicity and not speed.
This is a simple Demo of Visual Question answering which uses
pretrained models (see models/CNN and models/VQA) to answer a given
question about the given image.
Dependency
Keras version 2.0+
Modular deep learning library based on python
Tensorflow 1.2+
(Might also work with Theano. I have not tested Theano
after the recent commit, use commit 0f89007 for Theano)
scikit-learn
Quintessential machine library for python
Spacy version 2.0+
python -m spacy download en_vectors_web_lg
Used to load Glove vectors (word2vec)
To upgrade & install Glove Vectors
OpenCV
OpenCV is used only to resize the image and change the color channels,
You may use other libraries as long as you can pass a 224x224 BGR Image (NOTE: BGR and not RGB)
python demo.py -image_file_name path_to_file -question "Question to be asked"
e.g
python demo.py -image_file_name test.jpg -question "Is there a man in the picture?"
if you have prefer to use Theano backend and if you have GPU you may want to run like this
THEANO_FLAGS='floatX=float32,device=gpu0,lib.cnmem=1,mode=FAST_RUN'
python demo.py -image_file_name test.jpg -question "What vechile is in
the picture?"
Expected Output :
095.2 % train
00.67 % subway
00.54 % mcdonald's
00.38 % bus
00.33 % train station