Abstract:Hotspot detection in data aims at finding out those areas with high density of data,and presenting these areas in a interpretable way.In this work,hotspot detecting algorithm is designed to deal with multidimensional data containing numerical features as well as categorical features.The core of the algorithm is the clustering algorithm CLTree+,a significant improvement over the baseline CLTree.CLTree+ is able to deal with numerical features and categorical features,and the clustering result of numerical features with periodical characteristics is also improved.Besides,the computational efficiency of CLTree+ is also improved.CLTree+ is applied to transaction data of large Internet businesses and find out a few areas with high density of data,and these areas are presented as the easy to interpret combinations of attributes and its values.
邹磊,朱晶,聂晓辉,苏亚,裴丹,孙宇. 基于聚类的多维数据热点发现算法[J]. 小型微型计算机系统, 2019, 40(3): 465-471.
ZOU Lei,ZHU Jing,NIE Xiao-hui,SU Ya,PEI Dan,SUN Yu . Detecting Hotspot in Multi-dimensional Data Through Clustering. Journal of Chinese Computer Systems, 2019, 40(3): 465-471.