Deakin University
Browse

Stance classification: a comparative study and use case on Australian parliamentary debates

Download (6.07 MB)
Version 3 2025-03-20, 05:01
Version 2 2025-03-12, 04:06
Version 1 2025-03-05, 02:42
journal contribution
posted on 2025-03-20, 05:01 authored by Stephanie Ng, James ZhangJames Zhang, Samson YuSamson Yu, Asim BhattiAsim Bhatti, Kathryn BackholerKathryn Backholer, CP Lim
Abstract Hansard, or the official verbatim transcripts of parliamentary debates, contains rich information for analysing discourse and political activities on a wide range of policy issues. A fundamental task in political text analysis is to predict whether a speaker takes on a positive or negative view about a debate topic. Unlike social media data, which has received extensive attention for political text mining, stance analysis on Hansard data remains understudied. The main distinctions between the two include longer text and context dependency related to a motion in the Hansard data. As a result, it is difficult to devise a text mining model for parliamentary debates based on existing studies of other applications. This raises the question of the generalisability of prominent methods for cross-domain classification under low-resourced data situations. To address this issue, we construct and compare various state-of-the-art natural language processing techniques and machine learning models for stance classification, using two benchmark datasets from the UK Hansard. To improve the model accuracy, a hybrid approach is designed, which leverages both text and numerical features in the classification process. The devised method achieves 15–20% improvement in accuracy compared to the baseline methods. Transfer learning of pre-trained language models is further investigated for political text representation and domain adaptation in a new stance classification task: Australian Hansard with debates focusing on the public health issue of obesity and related junk food marketing policies. Then, a feature augmentation technique is employed to optimise the learning model from the source domain for prediction on unseen test data in the target domain. This approach results in approximately 10% improvement in accuracy compared to those from the baseline methods. Finally, an error analysis is conducted to gain further insights into the devised model, which reveals the characteristics of commonly misclassified samples and suggestions for future work.

History

Journal

Journal of Computational Social Science

Volume

8

Article number

43

Location

Berlin, Germany

Open access

  • Yes

ISSN

2432-2717

eISSN

2432-2725

Language

eng

Publication classification

C1.1 Refereed article in a scholarly journal

Issue

2

Publisher

Springer

Usage metrics

    Research Publications

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC