File(s) under permanent embargo
Large-scale stylistic analysis of formality in academia and social media
conference contributionposted on 2016-01-01, 00:00 authored by Thin NguyenThin Nguyen, Svetha VenkateshSvetha Venkatesh, Quoc-Dinh Phung
The dictum ‘publish or perish’ has influenced the way scientists present research results as to get published, including exaggeration and overstatement of research findings. This behavior emerges patterns of using language in academia. For example, recently it has been found that the proportion of positive words has risen in the content of scientific articles over the last 40 years, which probably shows the tendency in scientists to exaggerate and overstate their research results. The practice may deviate from impersonal and formal style of academic writing. In this study the degree of formality in scientific articles is investigated through a corpus of 14 million PubMed abstracts. Three aspects of stylistic features are explored: expressing emotional information, using first person pronouns to refer to the authors, and mixing English varieties. Trends of these stylistic features in scientific publications for the last four decades were discovered. A comparison on the emotional information with other online user-generated media, including online encyclopedias, web-logs, forums, and micro-blogs, was conducted. Advances in cluster computing are employed to process large scale data, with 5.8 terabytes and 3.6 billions of data points from all the media. The results suggest the potential of pattern recognition in data at scale.