posted on 2025-10-27, 02:50authored byLong Hoang Dang
This thesis introduces innovative deep neural network architectures designed to address vision-and-language reasoning tasks and a prompting technique to query massive Large Vision-Language Models. The focus is on evolving from basic one-step classification to developing dynamic, query-specific computational models. These models support iterative reasoning, visual-language understanding, and the connection of abstract linguistic concepts with real visual elements.<p></p>