Abstract:Aiming at the accuracy of most existing outlier detection algorithms for mixed data is not high enough as desired.To solve the problem,an two stage outlier detection algorithm is proposed for mixed data based on improved DBSCAN clustering and new local outlier factor LAOF.however in theDBSCANalgorithm,the parameters of εand Minptsneed to be determined artificially,which leads to the poor accuracy.In this paper we input the number of K nearest neighbor substituted forMinptsand the cluster radius is determined by the K nearest neighbor,which reduces the parameter input and improves the clustering quality.First carrying on the preliminary screening for mixed data by improved DBSCANclustering algorithm,Then the local anomaly of the mixed data set is calculated by the local outlier factor based on the area density (LAOF).In the process of distance measure for mixed data,the attribute weights are determined by the difference of the information entropy,we made it twice to determine the weight of the data in the further testing.At last,the proposed algorithm is verified by the actual data and the results showed that the algorithm can improve the accuracy of outlier detection.
石鸿雁,马晓娟. 改进的DBSCAN聚类和LAOF两阶段混合数据离群点检测方法[J]. 小型微型计算机系统, 2018, 39(1): 74-77.
SHI Hong-yan,MA Xiao-juan. Twostage Outlier Detection Method Based on DBSCAN Clustering and LAOF of Hybrid Data. Journal of Chinese Computer Systems, 2018, 39(1): 74-77.