Deakin University
Browse

An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics

journal contribution
posted on 2022-09-29, 01:34 authored by Huan Yee Koh, Jiaxin Ju, Ming LiuMing Liu, Shirui Pan
Long documents such as academic articles and business reports have been the standard format to detail out important issues and complicated subjects that require extra attention. An automatic summarization system that can effectively condense long documents into short and concise texts to encapsulate the most important information would thus be significant in aiding the reader’s comprehension. Recently, with the advent of neural architectures, significant research efforts have been made to advance automatic text summarization systems, and numerous studies on the challenges of extending these systems to the long document domain have emerged. In this survey, we provide a comprehensive overview of the research on long document summarization and a systematic evaluation across the three principal components of its research setting: benchmark datasets, summarization models, and evaluation metrics. For each component, we organize the literature within the context of long document summarization and conduct an empirical analysis to broaden the perspective on current research progress. The empirical analysis includes a study on the intrinsic characteristics of benchmark datasets, a multi-dimensional analysis of summarization models, and a review of the summarization evaluation metrics. Based on the overall findings, we conclude by proposing possible directions for future exploration in this rapidly growing field.

Funding

Building resilience in at-risk rural communities through improving Media Communication on Climate Change Policies | Funder: Department of Foreign Affairs and Trade | Grant ID: 1447/CRG/2023/26-DU

Large Language Models in Engineering. | Funder: Aurecon Australasia Pty Ltd | Grant ID: INT-1239

Personalised Privacy-Preserving Network Data Publishing System | Funder: Australian Research Council | Grant ID: LP220200746

History

Journal

ACM Computing Surveys

Pagination

1-39

Location

New York, N.Y.

ISSN

0360-0300

eISSN

1557-7341

Language

eng

Publication classification

C1 Refereed article in a scholarly journal

Publisher

Association for Computing Machinery (ACM)

Usage metrics

    Research Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC