========================================= Outlier Detection with K-Means Clustering ========================================= This step first partitions the data points into :math:`k` clusters by applying the *K-Means Clustering* algorithm. Then, a distance ratio .. math:: \frac{dist(Y_o, c_o)}{\bar{c}_o} for each data object is calculated, where :math:`Y_o` is the data object, :math:`c_{o}` is the center of the cluster which :math:`Y_o` belongs to and :math:`\bar{c}_o` stands for the average distance between all data objects in that cluster and the center :math:`c_{o}`\ . The larger the ratio, the farther away the data object is relative from the center. Finally, if the calculated ratio is above a pre-defined threshold, the observed data object is identified as an outlier. .. rubric:: Input Parameters 1. Data samples abstracted in a n-dimensional feature space 2. Specified number of clusters 3. A pre-defined threshold for the distance ratios .. rubric:: Output Parameters 1. Original data with outliers marked .. rubric:: Workflow .. image:: workflow.svg .. rubric:: Algorithm :doc:`/Algorithms/KMeansClustering/index` .. rubric:: References - J.\ Han, M. Kamber and J. Pei, Data Mining - Concepts and Techniques, 3rd ed., Amsterdam: Morgan Kaufmann Publishers, 2012.