Deakin University
Browse

Probabilistic models over ordered partitions with applications in document ranking and collaborative filtering

Version 2 2024-06-03, 17:51
Version 1 2014-10-28, 09:38
conference contribution
posted on 2024-06-03, 17:51 authored by T Truyen, D Phung, Svetha VenkateshSvetha Venkatesh
Ranking is an important task for handling a large amount of content. Ideally, training data for supervised ranking would include a complete rank of documents (or other objects such as images or videos) for a particular query. However, this is only possible for small sets of documents. In practice, one often resorts to document rating, in that a subset of documents is assigned with a small number indicating the degree of relevance. This poses a general problem of modelling and learning rank data with ties. In this paper, we propose a probabilistic generative model, that models the process as permutations over partitions. This results in super-exponential combinatorial state space with unknown numbers of partitions and unknown ordering among them. We approach the problem from the discrete choice theory, where subsets are chosen in a stagewise manner, reducing the state space per each stage significantly. Further, we show that with suitable parameterisation, we can still learn the models in linear time. We evaluate the proposed models on two application areas: (i) document ranking with the data from the recently held Yahoo! challenge, and (ii) collaborative filtering with movie data. The results demonstrate that the models are competitive against well-known rivals.

History

Pagination

426-437

Location

Mesa, Ariz.

Start date

2011-04-28

End date

2011-04-30

ISBN-13

9780898719925

Language

eng

Publication classification

E1.1 Full written paper - refereed

Copyright notice

2011, SIAM

Title of proceedings

SDM 2011 : Proceedings of the 11th SIAM International Conference on Data Mining

Event

International Conference on Data Mining (11th : 2011 : Mesa, Ariz.)

Publisher

Society for Industrial and Applied Mathematics

Place of publication

Philadelphia, Pa.

Usage metrics

    Research Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC