Deakin University
Browse

Bayesian Networks for Data Integration in the Absence of Foreign Keys

Version 2 2024-06-05, 06:20
Version 1 2019-11-02, 11:15
journal contribution
posted on 2024-06-05, 06:20 authored by B Zhang, S Sanner, Mohamed Reda BouadjenekMohamed Reda Bouadjenek, S Gupta
IEEE In the era of open data, a single data source rarely contains all of the attributes we need for inference in specific applications. For example, a marketing department may aim to integrate retailer-specific purchase data with separate demographic data for purposes of targeted advertising -- a capability not possible with either dataset alone. In this work, we address two key desiderata of an automated framework for probabilistic data integration over multiple data sources: (1) we require that each relational data source share at least one attribute with another relational data source, but we do not require these attributes to be foreign keys and (2) we require inference to be probabilistic to reflect inherent uncertainty in population-level predictions given the absence of foreign keys. While some frameworks such as Probabilistic Relational Models (PRMs) address point (2), they do not address point (1) since they rely on foreign keys to link tables. To achieve both desiderata simultaneously, we develop an automated framework to construct Bayesian networks for data integration capable of answering any probabilistic query. We demonstrate that our framework is able to approximate the inference of a global Bayesian network over a single relation that has been projected onto multiple relations.

History

Journal

IEEE Transactions on Knowledge and Data Engineering

Volume

32

Pagination

803-808

Location

Piscataway, N.J.

ISSN

1041-4347

eISSN

1558-2191

Language

English

Publication classification

C1.1 Refereed article in a scholarly journal

Issue

4

Publisher

IEEE COMPUTER SOC