Friday Lunch and Talk Series

From Scene Understanding to Vision-Language Joint Understanding

17^th November 2023, 14:00 Ashton Lecture Theatre
Guangliang Cheng

Abstract

The field of computer vision has experienced remarkable advancements in recent years, as efforts have been made to enable machines to comprehend visual scenes and derive meaningful information from images and videos. However, traditional approaches to scene understanding have primarily focused on isolated visual analysis, disregarding the wealth of semantic knowledge and contextual understanding that can be acquired through the incorporation of natural language processing. Consequently, there has been a shift in paradigm towards the emerging research area of vision-language joint understanding, which aims to bridge the gap between visual and linguistic modalities in order to achieve a more comprehensive understanding of visual content.

During this presentation, I will provide a brief overview of my previous research endeavours pertaining to Scene Understanding. I will also summarize our most recent research advancements in the field of vision-language joint understanding, highlighting some pioneering works. Furthermore, we will also discuss the challenges and future research directions in addressing the problems of language-vision joint understanding.