Deakin University
Browse

File(s) under embargo

Sparse Frame Grouping Network with Action Centered for Untrimmed Video Paragraph Captioning

conference contribution
posted on 2024-02-06, 04:14 authored by G Yu, Y Hu, Y Zhang, R Feng, T Zhang, Shang GaoShang Gao
Generating paragraph captions for untrimmed videos without event annotations is challenging, especially when aiming to enhance precision and minimize repetition at the same time. To address this challenge, we propose a module called Sparse Frame Grouping (SFG). It dynamically groups event information with the help of action information for the entire video and excludes redundant frames within pre-defined clips. To enhance the performance, an Intra-Contrastive Learning technique is designed to align the SFG module with the core event content in the paragraph, and an Inter-Contrastive Learning method is employed to learn action-guided context with reduced static noise simultaneously. Extensive experiments are conducted on two benchmark datasets (ActivityNet Captions and YouCook2). Results demonstrate that SFG outperforms the state-of-the-art methods on all metrics.

History

Pagination

14571-14580

Location

Singapore

Start date

2023-12-01

End date

2023-12-01

ISBN-13

9798891760615

Language

eng

Title of proceedings

Findings of the Association for Computational Linguistics: EMNLP 2023

Event

Findings of the Association for Computational Linguistics: EMNLP 2023

Publisher

Association for Computational Linguistics

Place of publication

Kerrville, TX

Usage metrics

    Research Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC