Deakin University
Browse

File(s) not publicly available

CMAL: Cost-Effective Multi-Label Active Learning by Querying Subexamples

journal contribution
posted on 2022-09-29, 02:16 authored by G Yu, X Chen, C Domeniconi, J Wang, Z Li, Zili ZhangZili Zhang, X Zhang
Multi-label active learning (MAL) aims to learn an accurate multi-label classifier by selecting which examples (or example-label pairs) will be annotated and reducing query effort. MAL is a more complicated and expensive process than single-label active learning, due to one example can be associated with a set of non-exclusive labels and the annotator has to scrutinize the whole example and label space to provide correct annotations. Instead of scrutinizing the whole example for annotation, we may just examine some of its subexamples with respect to a label for annotation. In this way, we can not only save the annotation cost but also speedup the annotation process. Given this observation, we introduce CMAL, a two-stage Cost-effective MAL strategy (CMAL) by querying subexamples. CMAL first selects the most informative example-label pairs by leveraging uncertainty, label correlation and label space sparsity. Specifically, the uncertainty of a label to an example can be reduced if its correlated labels already annotated to the example, and its uncertainty can be reduced also if more examples annotated to this label. Next, CMAL greedily queries the most probable positive subexample-label pairs of the selected example-label pair. In addition, we propose rCMAL to account for the representative of examples to more reliably select example-label pairs in the first stage. Extensive experiments on multi-label datasets from diverse domains show that our proposed CMAL and rCMAL can better save the query cost than state-of-the-art MAL methods. The contribution of leveraging label correlation, label sparsity, and representative for saving cost is also confirmed.

History

Journal

IEEE Transactions on Knowledge and Data Engineering

Volume

34

Issue

5

Pagination

2091 - 2105

ISSN

1041-4347

eISSN

1558-2191

Publication classification

C1.1 Refereed article in a scholarly journal

Usage metrics

    Research Publications

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC