=========================================
Outlier Detection with K-Means Clustering
=========================================
This step first partitions the data points into :math:`k` clusters by applying the *K-Means Clustering* algorithm. Then, a distance ratio 

.. math::

   \frac{dist(Y_o, c_o)}{\bar{c}_o}

for each data object is calculated, where :math:`Y_o` is the data object, :math:`c_{o}` is the center of the cluster which :math:`Y_o` belongs to and :math:`\bar{c}_o` stands for the average distance between all data objects in that cluster and the center :math:`c_{o}`\ . The larger the ratio, the farther away the data object is relative from the center. Finally, if the calculated ratio is above a pre-defined threshold, the observed data object is identified as an outlier.

.. rubric:: Input Parameters

1. Data samples abstracted in a n-dimensional feature space

2. Specified number of clusters

3. A pre-defined threshold for the distance ratios

.. rubric:: Output Parameters

1. Original data with outliers marked

.. rubric:: Workflow

.. image:: workflow.svg


.. rubric:: Algorithm

:doc:`/Algorithms/KMeansClustering/index`

.. rubric:: References

- J.\  Han, M. Kamber and J. Pei, Data Mining - Concepts and Techniques, 3rd ed., Amsterdam: Morgan Kaufmann Publishers, 2012.