Drinking alcohol has high cost on society. The journey from being a regular drinker to a successful quitter may be a long and hard journey, fraught with the risk to relapse. Research has shown that certain behavioral changes can be effective towards staying abstained. Traditional way to conduct research on drinking abstainers uses questionnaire based approach to collect data from a curated group of people. However, it is an expensive approach in both cost and time and often results in small data with less diversity. Recently, social media has emerged as a rich data source. Reddit is one such social media platform that has a community (‘subreddit’) with an interest to quit drinking. The discussions among the group dates back to year 2011 and contain more than 40,000 posts. This large scale data is generated by users themselves and without being limited by any survey questionnaires. The most predictive factors from the features (unigrams, topics and LIWC) associated with short-term and long-term abstinence are identified using Lasso. It is seen that many common patterns manifest in unigrams, topics and LIWC. Whilst topics provided much richer associations between a group of words and the outcome, unigrams and LIWC are found to be good at finding highly predictive solo and psycho linguistically important words. Combining them we have found that many interesting patterns that are associated with the successful attempt made by the long-term abstainer, at the same time finding many of the common issues faced during the initial period of abstinence.
Advanced Data Mining and Applications. International Conference (12th : 2016 : Gold Coast, Queensland)