Outlier Detection with Chi-Squared TestΒΆ
This step detects outliers in multivariate data by applying the modified version of Chi-Squared Test. The basic formula is as follows:
\[{\chi^2 = \sum_{i=1}^{N} \frac{(o_{i} - E_i)^2}{E_i}}\quad\text{,}\]
where \(o\) is the object to be tested and \(o_i\) is the value of \(o\) on the \(i\)th dimension. \(E_i\) is the mean value on the \(i\)th dimension among all objects. The object may be identified as an outlier if the Chi-value is larger than a threshold value.
Input Parameters
- Multivariate data including outliers
Output Parameters
- Original data with outliers marked
Workflow
Algorithm
References
- J. Han, M. Kamber and J. Pei, Data Mining - Concepts and Techniques, 3rd ed., Amsterdam: Morgan Kaufmann Publishers, 2012.