Screencast by Issey Masuda about the paper:
Zhu, Yuke, Oliver Groth, Michael Bernstein, and Li Fei-Fei. “Visual7W: Grounded Question Answering in Images.“ CVPR 2016.
We have seen great progress in basic perceptual tasks such as object recognition and detection. However, AI models still fail to match humans in high-level vision tasks due to the lack of capacities for deeper reasoning. Recently the new task of visual question answering (QA) has been proposed to evaluate a model’s capacity for deep image und