# Outlier Detection with Chi-Squared Test¶

Causal Step

This step detects outliers in multivariate data by applying the modified version of Chi-Squared Test. The basic formula is as follows:

${\chi^2 = \sum_{i=1}^{N} \frac{(o_{i} - E_i)^2}{E_i}}\quad\text{,}$

where $$o$$ is the object to be tested and $$o_i$$ is the value of $$o$$ on the $$i$$th dimension. $$E_i$$ is the mean value on the $$i$$th dimension among all objects. The object may be identified as an outlier if the Chi-value is larger than a threshold value.

Input Parameters

1. Multivariate data including outliers

Output Parameters

1. Original data with outliers marked

Workflow

Algorithm

Chi-Squared Test

References

• J. Han, M. Kamber and J. Pei, Data Mining - Concepts and Techniques, 3rd ed., Amsterdam: Morgan Kaufmann Publishers, 2012.