GLIMMER: Incorporating Graph and Lexical Features in Unsupervised Multi-Document Summarization
conference contribution
posted on 2025-02-12, 02:35authored byR Liu, Ming LiuMing Liu, M Yu, J Jiang, Gang LiGang Li, D Zhang, J Li, X Meng, W Huang
Pre-trained language models are increasingly being used in multi-document summarization tasks. However, these models need large-scale corpora for pre-training and are domain-dependent. Other non-neural unsupervised summarization approaches mostly rely on key sentence extraction, which can lead to information loss. To address these challenges, we propose a lightweight yet effective unsupervised approach called GLIMMER: a Graph and LexIcal features based unsupervised Multi-docuMEnt summaRization approach. It first constructs a sentence graph from the source documents, then automatically identifies semantic clusters by mining low-level features from raw texts, thereby improving intra-cluster correlation and the fluency of generated sentences. Finally, it summarizes clusters into natural sentences. Experiments conducted on Multi-News, Multi-XScience and DUC-2004 demonstrate that our approach outperforms existing unsupervised approaches. Furthermore, it surpasses state-of-the-art pre-trained multi-document summarization models (e.g. PEGASUS and PRIMERA) under zero-shot settings in terms of ROUGE scores. Additionally, human evaluations indicate that summaries generated by GLIMMER achieve high readability and informativeness scores. Our code is available at https://github.com/Oswald1997/GLIMMER.
PAIS 2024 : Proceedings of the 27th European Conference on Artificial Intelligence - Including 13th Conference on Prestigious Applications of Intelligent Systems
Event
Artificial Intelligence. Conference (2024 : 27th : Santiago de Compostela, Spain)
Publisher
IOS Press
Place of publication
Amsterdam, The Netherlands
Series
Frontiers in Artificial Intelligence and Applications