==================
K-Means Clustering
==================
:doc:`/WorkProcessClassifiers/GlobalAlgorithm/index` - :doc:`/WorkProcessClassifiers/OneDimensionalAlgorithm/index`

*K-Means Clustering* algorithm is a simple unsupervised learning algorithm used to solve clustering problems. By assuming :math:`k` clusters, it minimizes the sum of distances (points to cluster centroids) through iteration.

For details refer to the online tutorial `http://www-2.cs.cmu.edu/~awm/tutorials/kmeans.html <http://www-2.cs.cmu.edu/~awm/tutorials/kmeans.html>`__.



.. rubric:: Input Parameters

+--------------------+--------------------------------------------+----------------------------------------+---------------------------------------+---------+
| Parameter          | Type                                       | Constraint                             | Description                           | Remarks |
+====================+============================================+========================================+=======================================+=========+
| :math:`Y`          | :math:`Y \in \mathbb R^{N}`                | :math:`N \in \mathbb{N}`               | Input data of size :math:`N`          |         |
+--------------------+--------------------------------------------+----------------------------------------+---------------------------------------+---------+
| :math:`k`          | :math:`k \in \mathbb{N}`                   | :math:`k \lt N`                        | Specified number of clusters          |         |
+--------------------+--------------------------------------------+----------------------------------------+---------------------------------------+---------+

.. rubric:: Output Parameters

+----------------------------+----------------------------------------------------+------------+-----------------------------------------------------------+---------+
| Parameter                  | Type                                               | Constraint | Description                                               | Remarks |
+============================+====================================================+============+===========================================================+=========+
| :math:`\hat{Y}`            | :math:`\hat{Y} \in \mathbb R^{k}`                  |            | A vector of :math:`k`          cluster centroid locations |         |
+----------------------------+----------------------------------------------------+------------+-----------------------------------------------------------+---------+

.. rubric:: Tool Support

* :doc:`/Tools/MatlabTool/index`

  For details refer to the online documentation of the function `'kmeans' <http://www.mathworks.de/help/toolbox/stats/kmeans.html>`__.

.. rubric:: Single Steps using the Algorithm

* :doc:`/DataPreprocessing/DataDiscretization/DataDiscretizationWithKMeansClustering/index`

* :doc:`/DataPreprocessing/DataReduction/DimensionalityReduction/DataReductionWithKMeansClustering/index`

* :doc:`/DataPreprocessing/DataCleaning/OutlierDetection/OutlierDetectionWithKMeansClustering/index`

.. rubric:: References

- J.B.\  MacQueen, Some Methods for classification and Analysis of Multivariate Observations, Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, vol. 1, pp. 281-297, 1967.