Redundancy Detection with Chi-Squared TestΒΆ

Causal Step

Redundancy detection is an important task in data integration. This step applies the Chi-Squared Test to evaluate the correlation between two attributes (for nominal data). A very high Chi-value indicates that one attribute strongly implies the other and may be removed as a redundancy.

Input Parameters

  1. Nominal data

Output Parameters

  1. Redundant attributes

Workflow

../../../../_images/workflow47.svg

Algorithm

Chi-Squared Test

References

    1. Han, M. Kamber and J. Pei, Data Mining - Concepts and Techniques, 3rd ed., Amsterdam: Morgan Kaufmann Publishers, 2012.