(1.厦门理工学院 计算机科学与技术系,福建 厦门 361012; 2. 浙江大学 计算机科学与技术系, 杭州 310027)
支持向量机(support vector machine, SVM)具有良好的泛化性能而被广泛应用于机器学习及模式识别领域。然而,当训练集较大时,训练SVM需要极大的时间及空间开销。另一方面,SVM训练所得的判定函数取决于支持向量,使用支持向量集取代训练样本集进行学习,可以在不影响结果分类器分类精度的同时缩短训练时间。采用混合方法来削减训练数据集,实现潜在支持向量的选择,从而降低SVM训练所需的时间及空间复杂度。实验结果表明,该算法在极大提高SVM训练速度的同时,基本维持了原始分类器的泛化性能。
关键词:二次规划; 无监督聚类; 权值; 距离阈值; 潜在支持向量
Fast training support vector machine based on clustering
ZENG Zhi-qiang1, GAO Ji2, XIE Yan-qi1
(1.Dept. of Computer Science & Technology, Xiamen University of Technology, Xiamen Fujian 361012, China; 2.Dept. of Computer Science & Technology, Zhejiang University, Hangzhou 310027, China)
SVM is a well-known method used for pattern recognition and machine learning. However, training a SVM is very costly in terms of time and memory consumption when the data set is large. In contrast, the SVM decision function is fully determined by a small subset of the training data, called support vectors. Therefore, removing any training samples that are not relevant to support vectors might have no effect on building the proper decision function. This paper proposed a hybrid method to remove from the training set the data that was irrelevant to the final decision function, and thus the number of vectors for SVM training became small and the training time could be decreased greatly. Experimental results show that a significant amount of training time can be reduced by the method without compromising the generalization capability of SVM. ......