Deakin University
Browse

File(s) under permanent embargo

Prediction of population health indices from social media using kernel-based textual and temporal features

Version 2 2024-06-06, 02:45
Version 1 2019-04-15, 16:22
conference contribution
posted on 2024-06-06, 02:45 authored by Thin NguyenThin Nguyen, Duc Thanh NguyenDuc Thanh Nguyen, ME Larsen, B O'Dea, John YearwoodJohn Yearwood, QD Phung, Svetha VenkateshSvetha Venkatesh, H Christensen
From 1984, the US has annually conducted the Behavioral Risk Factor Surveillance System (BRFSS) surveys to capture either health behaviors, such as drinking or smoking, or health outcomes, including mental, physical, and generic health, of the population. Although this kind of information at a population level, such as US counties, is important for local governments to identify local needs, traditional datasets may take years to collate and to become publicly available. Geocoded social media data can provide an alternative reflection of local health trends. In this work, to predict the percentage of adults in a county reporting“insufficient sleep”, a health behavior, and, at the same time, their health outcomes, novel textual and temporal features are proposed. The proposed textual features are defined at mid-level and can be applied on top of various low-level textual features. They are computed via kernel functions on underlying features and encode the relationships between individual underlying features over a population. To further enrich the predictive ability of the health indices, the textual features are augmented with temporal information. We evaluated the proposed features and compared them with existing features using a dataset collected from the BRFSS. Experimental results show that the combination of kernel-based textual features and temporal information predict well both the health behavior (with best performance at rho=0.82) and health outcomes (with best performance at rho=0.78), demonstrating the capability of social media data in prediction of population health indices. The results also show that our proposed features gained higher correlation coefficients than did the existing ones, increasing the correlation coefficient by up to 0.16, suggesting the potential of the approach in a wide spectrum of applications on data analytics at population levels.

History

Pagination

99-107

Location

Perth, W.A.

Start date

2017-04-03

End date

2017-04-07

ISBN-13

9781450349147

Language

eng

Publication classification

E1 Full written paper - refereed

Copyright notice

2017, International World Wide Web Conference Committee (IW3C2)

Editor/Contributor(s)

[Unknown]

Title of proceedings

WWW 2017 : Proceedings of the 26th International World Wide Web Conference 2017

Event

International World Wide Web Conferences Committee. Conference (26th : 2017 : Perth, W.A.)

Publisher

International World Wide Web Conferences Steering Committee

Place of publication

Republic and Canton of Geneva, Switzerland

Series

International World Wide Web Conferences Committee Conference

Usage metrics

    Research Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC