File(s) under permanent embargo

Leveraging label category relationships in multi-class crowdsourcing

conference contribution
posted on 2018-01-01, 00:00 authored by Yuan Jin, L Du, Ye ZhuYe Zhu, M Carman
Current quality control methods for crowdsourcing largely account for variations in worker responses to items by interactions between item difficulty and worker expertise. Few have taken into account the semantic relationships that can exist between the response label categories. When the number of the label categories is large, these relationships are naturally indicative of how crowd-workers respond to items, with expert workers tending to respond with more semantically related categories to the categories of true labels, and with difficult items tending to see greater spread in the responded labels. Based on these observations, we propose a new statistical model which contains a latent real-valued matrix for capturing the relatedness of response categories alongside variables for worker expertise, item difficulty and item true labels. The model can be easily extended to incorporate prior knowledge about the semantic relationships between response labels in the form of a hierarchy over them. Experiments show that compared with numerous state-of-the-art baselines, our model (both with and without the prior knowledge) yields superior true label prediction performance on four new crowdsourcing datasets featuring large sets of label categories.

History

Event

Knowledge Discovery and Data Mining. Pacific-Asia Conference (22nd : 2018 : Melbourne, Victoria)

Volume

10938

Series

Lecture Notes in Computer Science

Pagination

128 - 140

Publisher

Springer

Location

Melbourne, Victoria

Place of publication

Cham, Switzerland

Start date

2018-06-03

End date

2018-06-06

ISSN

0302-9743

eISSN

1611-3349

ISBN-13

9783319930367

Language

eng

Publication classification

E Conference publication; E1 Full written paper - refereed

Editor/Contributor(s)

Dinh Phung, Tseng Vincent, Geoffrey Webb, Bao Ho, Mohadeseh Ganji, Lida Rashidi

Title of proceedings

PAKDD 2018 : Advances in Knowledge Discovery and Data Mining : Proceedings of 22nd Pacific-Asia Conference