Visual Question Answering: From Theory to Application / Qi Wu, Peng Wang, Xin Wang, Xiaodong He, Wenwu Zhu

Publication year: 2022

ISBN: 978-981-19-0964-1

Combines visual inputs like image and video with a natural language question concerning the input and generates a natural language answer as the output. This is by nature a multi-disciplinary research problem, involving computer vision (CV), natural language processing (NLP), knowledge representation and reasoning (KR), etc. Provides a comprehensive overview of VQA, covering fundamental theories, models, datasets, and promising future directions. Given its scope, it can be used as a textbook on computer vision and natural language processing, especially for researchers and students in the area of visual question answering. It also highlights the key models used in VQA.

Subject: Computer Vision, Machine Learning, Knowledge Based Systems, Logic in AI, Visual Question Answering, VQA, Image-based Question Answering, Vision-and-Language, Deep Learning