机器学习--- K均值(K-Means)
假定我们有如下8个点
A1(2, 10) A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4) A7(1, 2) A8(4, 9)
希望分成3个聚类
初始化选择 A1(2, 10), A4(5, 8) ,A7(1, 2)为聚类中心点,两点距离定义为ρ(a, b) = |x2 – x1| + |y2 – y1| .
第一步
Iteration 1
|
|
(2, 10) |
(5, 8) |
(1, 2) |
|
|
Point |
Dist Mean cluster 1 |
Dist Mean cluster2 |
Dist Mean cluster 3 |
Cluster |
A1 |
(2, 10) |
|
|
|
|
A2 |
(2, 5) |
|
|
|
|
A3 |
(8, 4) |
|
|
|
|
A4 |
(5, 8) |
|
|
|
|
A5 |
(7, 5) |
|
|
|
|
A6 |
(6, 4) |
|
|
|
|
A7 |
(1, 2) |
|
|
|
|
A8 |
(4, 9) |
|
|
|
|
对A1点,计算其到每个cluster 的距离
A1->class1 = |2-2|+|10-10}=0
A1->class2 = |2-5|+|10-8|=5
A1->class3 = |2-1|+|10-2|=9
因此A1 属于cluster1
|
|
(2, 10) |
(5, 8) |
(1, 2) |
|
|
Point |
Dist Mean cluster 1 |
Dist Mean cluster 2 |
Dist Mean cluster 3 |
Cluster |
A1 |
(2, 10) |
0 |
5 |
9 |
1 |
A2 |
(2, 5) |
|
|
|
|
A3 |
(8, 4) |
|
|
|
|
A4 |
(5, 8) |
|
|
|
|
A5 |
(7, 5) |
|
|
|
|
A6 |
(6, 4) |
|
|
|
|
A7 |
(1, 2) |
|
|
|
|
A8 |
(4, 9) |
|
|
|
|
余下继续计算,直到
|
|
(2, 10) |
(5, 8) |
(1, 2) |
|
|
Point |
Dist Mean cluster 1 |
Dist Mean cluster 2 |
Dist Mean cluster 3 |
Cluster |
A1 |
(2, 10) |
0 |
5 |
9 |
1 |
A2 |
(2, 5) |
5 |
6 |
4 |
3 |
A3 |
(8, 4) |
12 |
7 |
9 |
2 |
A4 |
(5, 8) |
5 |
0 |
10 |
2 |
A5 |
(7, 5) |
10 |
5 |
9 |
2 |
A6 |
(6, 4) |
10 |
5 |
7 |
2 |
A7 |
(1, 2) |
9 |
10 |
0 |
3 |
A8 |
(4, 9) |
3 |
2 |
10 |
2 |
重新计算中心点
cluster1只有1个点,因此A1为中心点
cluster2的中心点为 ( (8+5+7+6+4)/5,(4+8+5+4+9)/5 )=(6,6)。注意:这个点并不实际存在。
cluster3的中心点为( (2+1)/2, (5+2)/2 )= (1.5, 3.5)
图形化的过程如下:
持续迭代,直到前后两次迭代不发生变化为止,如下: