Deakin University
Browse

Witan: Unsupervised Labelling Function Generation for Assisted Data Programming

Version 2 2024-06-03, 02:59
Version 1 2024-04-17, 05:03
conference contribution
posted on 2024-06-03, 02:59 authored by B Denham, EMK Lai, Roopak SinhaRoopak Sinha, MA Naeem
Effective supervised training of modern machine learning models often requires large labelled training datasets, which could be prohibitively costly to acquire for many practical applications. Research addressing this problem has sought ways to leverage weak supervision sources, such as the user-defined heuristic labelling functions used in the data programming paradigm, which are cheaper and easier to acquire. Automatic generation of these functions can make data programming even more efficient and effective. However, existing approaches rely on initial supervision in the form of small labelled datasets or interactive user feedback. In this paper, we propose Witan, an algorithm for generating labelling functions without any initial supervision. This flexibility affords many interaction modes, including unsupervised dataset exploration before the user even defines a set of classes. Experiments in binary and multi-class classification demonstrate the efficiency and classification accuracy of Witan compared to alternative labelling approaches.

History

Volume

15

Pagination

2334-2347

Location

Sydney, N.S.W.

Start date

2022-09-05

End date

2022-09-09

ISSN

2150-8097

eISSN

2150-8097

Language

eng

Publication classification

E1.1 Full written paper - refereed

Title of proceedings

VLDB 2022 : Proceedings of the 48th International Conference on Very Large Data Bases

Event

VLDB Endowment. Conference (2022 : 48th : Sydney, N.S.W)

Issue

11

Publisher

Association for Computing Machinery (ACM)

Place of publication

New York, N.Y.

Usage metrics

    Research Publications

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC