Deakin University
Browse

Integration of large-scale community-developed causal loop diagrams: a Natural Language Processing approach to merging factors based on semantic similarity

Download (816.18 kB)
Version 2 2025-03-20, 05:02
Version 1 2025-03-13, 00:40
journal contribution
posted on 2025-03-20, 05:02 authored by Melissa Valdivia CabreraMelissa Valdivia Cabrera, Michael JohnstoneMichael Johnstone, Josh HaywardJosh Hayward, Kristy BoltonKristy Bolton, Douglas CreightonDouglas Creighton
Abstract Background Complex public health problems have been addressed in communities through systems thinking and participatory methods like Group Model Building (GMB) and Causal Loop Diagrams (CLDs) albeit with some challenges. This study aimed to explore the feasibility of Natural Language Processing (NLP) in simplifying and enhancing CLD merging processes, avoiding manual merging of factors, utilizing different semantic textual similarity models. Methods The factors of thirteen CLDs from different communities in Victoria, Australia regarding the health and wellbeing of children and young people were merged using NLP with the following process: (1) extracting and preprocessing of unique factor names; (2) assessing factor similarity using various language models; (3) determining optimal merging threshold maximising the F1-score; (4) merging the factors of the 13 CLDs based on the selected threshold. Results Overall sentence-transformer models performed better compared to word2vec, average word embeddings and Jaccard similarity. Of 161,182 comparisons, 1,123 with a score above 0.7 given by sentence-transformer models were analysed by the subject matter experts. Paraphrase-multilingual-mpnet-base-v2 had the highest F1-score of 0.68 and was used to merge the factors with a threshold of 0.75. From 592 factors, 344 were merged into 66 groups. Conclusions Utilizing language models facilitates identification of similar factors and has potential to aid researchers in constructing CLDs whilst reducing the time required to manually merge them. While models accurately merge synonymous or closely related factors, manual intervention may be required for specific cases.

History

Journal

BMC Public Health

Volume

25

Article number

923

Pagination

1-9

Location

London, Eng.

Open access

  • Yes

ISSN

1471-2458

eISSN

1471-2458

Language

eng

Publication classification

C1.1 Refereed article in a scholarly journal

Issue

1

Publisher

BioMed Central

Usage metrics

    Research Publications

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC