Deakin University
Browse

File(s) under permanent embargo

An enhanced extractive text summarization method for multiple documents

journal contribution
posted on 2023-05-03, 22:58 authored by AM Nitu, MD PALASH UDDINMD PALASH UDDIN, PB Tumpa, S Yeasmin, MI Afjal
Nowadays, text summarization has become an important issue to extract the required information within short time. Several techniques on extractive text summarization have been developed for summarizing English text(s). However, there is a few works done for the summarization of Bengali text(s). In this paper, an improved extractive Bengali text summarization technique has been proposed with enhancing the word scoring process, position value heuristics and summary generation procedure of our previously presented summarizer. In the word scoring procedure, each word is preprocessed using noise removal, tokenization, stop word removal and stemming operation. Then, a heuristics is applied to calculate the word score through checking it in all the input document(s). Moreover, a modified heuristic is proposed for the sentence scoring in which it has given the priority highest to the middle sentence and then the upper and lower sentences from the middle sentence will be less prioritized. Finally, top k-sentences are extracted from each of the clusters of sentences made by K-means clustering algorithm and then the extracted sentences are sorted as their actual appearances in the original document(s). Thus, the final summary is synchronized with the original document(s). In comparison to the existing method, the experimental result shows that the proposed improved technique produces better summarization to satisfy the end-users.

History

Journal

Journal of Theoretical and Applied Information Technology

Volume

97

Pagination

3475-3485

Location

Islamabad, Pakistan

ISSN

1992-8645

eISSN

1817-3195

Language

eng

Publication classification

C1.1 Refereed article in a scholarly journal

Issue

23

Publisher

Little Lion Scientific

Usage metrics

    Research Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC