SPDF: set probabilistic distance features for prediction of population health outcomes via social media

Nguyen, H; Nguyen, Duc Thanh; Nguyen, Thin

File(s) under permanent embargo

SPDF: set probabilistic distance features for prediction of population health outcomes via social media

conference contribution

posted on 2019-01-01, 00:00 authored by H Nguyen, Duc Thanh NguyenDuc Thanh Nguyen, Thin NguyenThin Nguyen

Measurement of population health outcomes is critical to understanding the health status of communities and thus enabling the development of appropriate health-care programmes for the communities. This task acquires the prediction of population health status to be fast and accurate yet scalable to different population sizes. To satisfy these requirements, this paper proposes a method for automatic prediction of population health outcomes from social media using Set Probabilistic Distance Features (SPDF). The proposed SPDF are mid-level features built upon the similarity in posting patterns between populations. Our proposed SPDF hold several advantages. Firstly, they can be applied to various low-level features. Secondly, our SPDF fit well problems with weakly labelled data, i.e., only the labels of sets are available while the labels of sets’ elements are not explicitly provided. We thoroughly evaluate our approach in the task of prediction of health indices of counties in the US via a large-scale dataset collected from Twitter. We also apply our proposed SPDF to two different textual features including latent topics and linguistic styles. We conduct two case studies: across-year vs across-county prediction. The performance of the approach is validated against the Behavioral Risk Factor Surveillance System surveys. Experimental results show that the proposed approach achieves state-of-the-art performance on linguistic style features in prediction of all health indices and in both case studies.

History

Event

Data Mining. Conference (17th : 2019 : Adelaide, S. Aust.)

Volume

1127

Series

Data Mining Conference

Pagination

54 - 63

Publisher

Springer

Location

Adelaide, S. Aust.

Place of publication

Singapore

Publisher DOI

https://doi.org/10.1007/978-981-15-1699-3_5

Start date

2019-12-02

End date

2019-12-05

ISSN

1865-0929

eISSN

1865-0937

ISBN-13

9789811516986

Language

eng

Publication classification

E1 Full written paper - refereed

Editor/Contributor(s)

T Le, K Ong, Y Zhao, W Jin, S Wong, L Liu, G Williams

Title of proceedings

AusDM 2019 : Proceedings of the 17th Australasian Conference on Data Mining 2019

Usage metrics

Keywords

Population health Social media

Licence

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

File(s) under permanent embargo

SPDF: set probabilistic distance features for prediction of population health outcomes via social media

History

Event

Volume

Series

Pagination

Publisher

Location

Place of publication

Publisher DOI

Start date

End date

ISSN

eISSN

ISBN-13

Language

Publication classification

Editor/Contributor(s)

Title of proceedings

Usage metrics

Categories

Keywords

Licence

Exports