Outlier Detection with Chi-Squared Test

Causal Step

This step detects outliers in multivariate data by applying the modified version of Chi-Squared Test. The basic formula is as follows:

χ2=Ni=1(oiEi)2Ei,

where o is the object to be tested and oi is the value of o on the ith dimension. Ei is the mean value on the ith dimension among all objects. The object may be identified as an outlier if the Chi-value is larger than a threshold value.

Input Parameters

  1. Multivariate data including outliers

Output Parameters

  1. Original data with outliers marked

Workflow

../../../../_images/workflow30.svg

Algorithm

Chi-Squared Test

References

  • J. Han, M. Kamber and J. Pei, Data Mining - Concepts and Techniques, 3rd ed., Amsterdam: Morgan Kaufmann Publishers, 2012.