Deakin University
Browse

File(s) under permanent embargo

Kernel-based features for predicting population health indices from geocoded social media data

journal contribution
posted on 2017-10-01, 00:00 authored by Thin NguyenThin Nguyen, M E Larsen, B O'Dea, Duc Thanh NguyenDuc Thanh Nguyen, John YearwoodJohn Yearwood, Quoc-Dinh Phung, Svetha VenkateshSvetha Venkatesh, H Christensen
When using tweets to predict population health index, due to the large scale of data, an aggregation of tweets by population has been a popular practice in learning features to characterize the population. This would alleviate the computational cost for extracting features on each individual tweet. On the other hand, much information on the population could be lost as the distribution of textual features of a population could be important for identifying the health index of that population. In addition, there could be relationships between features and those relationships could also convey predictive information of the health index. In this paper, we propose mid-level features namely kernel-based features for prediction of health indices of populations from social media data. The kernel-based features are extracted on the distributions of textual features over population tweets and encode the relationships between individual textual features in a kernel function. We implemented our features using three different kernel functions and applied them for two case studies of population health prediction: across-year prediction and across-county prediction. The kernel-based features were evaluated and compared with existing features on a dataset collected from the Behavioral Risk Factor Surveillance System dataset. Experimental results show that the kernel-based features gained significantly higher prediction performance than existing techniques, by up to 16.3%, suggesting the potential and applicability of the proposed features in a wide spectrum of applications on data analytics at population levels.

History

Journal

Decision Support Systems

Volume

102

Pagination

22 - 31

Publisher

Elsevier BV

Location

Amsterdam, The Netherlands

ISSN

0167-9236

Language

eng

Publication classification

C Journal article; C1 Refereed article in a scholarly journal

Copyright notice

2017, Elsevier B.V.