摘 要:引入事务的恢复机制改进K-means算法,改进后的算法允许在运行过程中的任何时刻停机,重新启动后可在停机前运算成果的基础上继续运算,直至算法结束。改进后的算法使得普通机器条件下针对大数据集运用K-means算法成为可能。改进后的算法在长达400 h的聚类运算中得到了检验。
Recoverable implementation of K-means clustering algorithm
HUANG Zhi-hua1,2a,WEN Bu-ying2b ,WANG Guo-qian3
(1.School of Information Science & Technology, Xiamen University, Xiamen Fujian 361005, China; 2.a.College of Mathematics & Computer Science, b.College of Electrical Engineering & Automation, Fuzhou University, Fuzhou 350108, China;3.Computing Center of Fujian Pro-vince, Fuzhou 350003, China)
Abstract:This paper improved K-means clustering algorithm by using transaction recovery mechanism.The new algorithm was able to resume itself without loss of computing time after the computer running, it was shut down on purpose or by chance at any time, so that it could achieve its goal in big data sets on ordinary computers. It was verified in a clustering task which spent as long as 400 hours.
Key words:K-means algorithm; clustering; recovery mechanism ......