File(s) under permanent embargo
SPDF: set probabilistic distance features for prediction of population health outcomes via social media
conference contribution
posted on 2019-01-01, 00:00 authored by H Nguyen, Duc Thanh NguyenDuc Thanh Nguyen, Thin NguyenThin NguyenMeasurement of population health outcomes is critical to understanding the health status of communities and thus enabling the development of appropriate health-care programmes for the communities. This task acquires the prediction of population health status to be fast and accurate yet scalable to different population sizes. To satisfy these requirements, this paper proposes a method for automatic prediction of population health outcomes from social media using Set Probabilistic Distance Features (SPDF). The proposed SPDF are mid-level features built upon the similarity in posting patterns between populations. Our proposed SPDF hold several advantages. Firstly, they can be applied to various low-level features. Secondly, our SPDF fit well problems with weakly labelled data, i.e., only the labels of sets are available while the labels of sets’ elements are not explicitly provided. We thoroughly evaluate our approach in the task of prediction of health indices of counties in the US via a large-scale dataset collected from Twitter. We also apply our proposed SPDF to two different textual features including latent topics and linguistic styles. We conduct two case studies: across-year vs across-county prediction. The performance of the approach is validated against the Behavioral Risk Factor Surveillance System surveys. Experimental results show that the proposed approach achieves state-of-the-art performance on linguistic style features in prediction of all health indices and in both case studies.
History
Event
Data Mining. Conference (17th : 2019 : Adelaide, S. Aust.)Volume
1127Series
Data Mining ConferencePagination
54 - 63Publisher
SpringerLocation
Adelaide, S. Aust.Place of publication
SingaporePublisher DOI
Start date
2019-12-02End date
2019-12-05ISSN
1865-0929eISSN
1865-0937ISBN-13
9789811516986Language
engPublication classification
E1 Full written paper - refereedEditor/Contributor(s)
T Le, K Ong, Y Zhao, W Jin, S Wong, L Liu, G WilliamsTitle of proceedings
AusDM 2019 : Proceedings of the 17th Australasian Conference on Data Mining 2019Usage metrics
Categories
No categories selectedKeywords
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC