Papers
arxiv:2403.15553

Efficiently Estimating Mutual Information Between Attributes Across Tables

Published on Mar 22, 2024
Authors:
,
,

Abstract

A new sketching method for efficient mutual information estimation is proposed to identify relevant external tables for relational data augmentation, reducing computational costs and improving data analytics.

AI-generated summary

Relational data augmentation is a powerful technique for enhancing data analytics and improving machine learning models by incorporating columns from external datasets. However, it is challenging to efficiently discover relevant external tables to join with a given input table. Existing approaches rely on data discovery systems to identify joinable tables from external sources, typically based on overlap or containment. However, the sheer number of tables obtained from these systems results in irrelevant joins that need to be performed; this can be computationally expensive or even infeasible in practice. We address this limitation by proposing the use of efficient mutual information (MI) estimation for finding relevant joinable tables. We introduce a new sketching method that enables efficient evaluation of relationship discovery queries by estimating MI without materializing the joins and returning a smaller set of tables that are more likely to be relevant. We also demonstrate the effectiveness of our approach at approximating MI in extensive experiments using synthetic and real-world datasets.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2403.15553 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2403.15553 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2403.15553 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.