Redundancy Detection with Chi-Squared Test¶
Redundancy detection is an important task in data integration. This step applies the Chi-Squared Test to evaluate the correlation between two attributes (for nominal data). A very high Chi-value indicates that one attribute strongly implies the other and may be removed as a redundancy.
- Nominal data
- Redundant attributes
- Han, M. Kamber and J. Pei, Data Mining - Concepts and Techniques, 3rd ed., Amsterdam: Morgan Kaufmann Publishers, 2012.