Deakin University
Browse

File(s) under permanent embargo

SPDF: set probabilistic distance features for prediction of population health outcomes via social media

conference contribution
posted on 2019-01-01, 00:00 authored by H Nguyen, Duc Thanh NguyenDuc Thanh Nguyen, Thin NguyenThin Nguyen
Measurement of population health outcomes is critical to understanding the health status of communities and thus enabling the development of appropriate health-care programmes for the communities. This task acquires the prediction of population health status to be fast and accurate yet scalable to different population sizes. To satisfy these requirements, this paper proposes a method for automatic prediction of population health outcomes from social media using Set Probabilistic Distance Features (SPDF). The proposed SPDF are mid-level features built upon the similarity in posting patterns between populations. Our proposed SPDF hold several advantages. Firstly, they can be applied to various low-level features. Secondly, our SPDF fit well problems with weakly labelled data, i.e., only the labels of sets are available while the labels of sets’ elements are not explicitly provided. We thoroughly evaluate our approach in the task of prediction of health indices of counties in the US via a large-scale dataset collected from Twitter. We also apply our proposed SPDF to two different textual features including latent topics and linguistic styles. We conduct two case studies: across-year vs across-county prediction. The performance of the approach is validated against the Behavioral Risk Factor Surveillance System surveys. Experimental results show that the proposed approach achieves state-of-the-art performance on linguistic style features in prediction of all health indices and in both case studies.

History

Event

Data Mining. Conference (17th : 2019 : Adelaide, S. Aust.)

Volume

1127

Series

Data Mining Conference

Pagination

54 - 63

Publisher

Springer

Location

Adelaide, S. Aust.

Place of publication

Singapore

Start date

2019-12-02

End date

2019-12-05

ISSN

1865-0929

eISSN

1865-0937

ISBN-13

9789811516986

Language

eng

Publication classification

E1 Full written paper - refereed

Editor/Contributor(s)

T Le, K Ong, Y Zhao, W Jin, S Wong, L Liu, G Williams

Title of proceedings

AusDM 2019 : Proceedings of the 17th Australasian Conference on Data Mining 2019

Usage metrics

    Research Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC