posted on 2025-10-27, 02:51authored byHoang Anh Pham
This research introduces new approaches to video dialog that utilize neural reasoning and object-centric analysis to facilitate meaningful conversations about visual content. By analysing videos into object trajectories and preserving dialog history, COST (Conversation about Objects in Space-Time) and N2N (End-to-End) effectively tackle challenges in visual understanding, linguistic comprehension, and advanced reasoning, showing promising results in performance evaluations.<p></p>