A Novel Density based Clustering Method using Nearest and Farthest Neighbor with PCA

  • Azadeh Faroughi Computer Engineering and IT Department Shiraz University of Technology Shiraz, Iran
  • Reza Javidan Computer Engineering and IT Department Shiraz University of Technology Shiraz, Iran
Keywords: nearest_neighbor density estimator, farthest neighbor, subgroups, principal component analysis(PCA)

Abstract

Common nearest-neighbor density estimators usually do not work well for high dimensional datasets. Moreover, they have high time complexity of O(n2) and require high memory usage especially when indexing is used. In order to overcome these limitations, we proposed a new method that calculates distances to nearest and farthest neighbor nodes to create dataset subgroups. Therefore computational time complexity becomes of O(nlogn) and space complexity becomes constant. After subgroup formation, assembling technique is used to derive correct clusters. In order to overcome high dimensional datasets problem, Principal Component Analysis (PCA) in the clustering method is used, which preprocesses high-dimensional data. Many experiments on synthetic data sets are carried out to demonstrate the feasibility of the proposed method. Furthermore we compared this algorithm to the similar algorithm –DBSCAN- on real-world datasets and the results showed significantly higher accuracy of the proposed method.

Downloads

Download data is not yet available.

Author Biographies

Azadeh Faroughi, Computer Engineering and IT Department Shiraz University of Technology Shiraz, Iran

Azadeh Faroughi received her B.Sc. degree of Information technology engineering from Isfahan University and M.Sc. degree in computer networks from Sahand University of technology in 2010 and 2012, respectively. Now she is a Ph.D. candidate on Computer Networks in Shiraz University of Technology. Her main research interests include machine learning, data mining, computer network, network securities, wireless network, optical network and heuristic algorithms.

Reza Javidan, Computer Engineering and IT Department Shiraz University of Technology Shiraz, Iran

Reza Javidan born in 1970. He received his M.Sc. Degree in Computer Engineering (Machine Intelligence and Robotics) from Shiraz University in 1996 and his Ph.D. degree in Computer Engineering (Artificial Intelligence) from Shiraz University in 2007.Dr. Javidan has many publications in international conferences and journals regarding Image Processing, Underwater Wireless Sensor Networks (UWSNs) and Software Defined Networks (SDNs.His major fields of interest are Network security, Underwater Wireless Sensor Networks (UWSNs), Software Defined Networks (SDNs), artificial intelligence, image processing and SONAR systems. Dr. Javidan is now member of faculty and lecturer in Department of Computer Engineering and Information Technology in Shiraz University of Technology.

References

[1] A. Nagpal, A. Jatain, and D. Gaur, “Review based on data clustering algorithms,” in 2013 IEEE CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES, 2013, pp. 298–303.
[2] Z. Nafar and A. Golshani, “Data Mining Methods for Protein-Protein Interactions,” in 2006 Canadian Conference on Electrical and Computer Engineering, 2006, pp. 991–994.
[3] W. Yu, G. Qiang, and L. Xiao-li, “A Kernel Aggregate Clustering Approach for Mixed Data Set and Its Application in Customer Segmentation,” 2006 Int. Conf. Manag. Sci. Eng., pp. 121–124, 2006.
[4] Zicheng Liao, H. Hoppe, D. Forsyth, and Yizhou Yu, “A Subdivision-Based Representation for Vector Image Editing,” IEEE Trans. Vis. Comput. Graph., vol. 18, no. 11, pp. 1858–1867, Nov. 2012.
[5] P. K. Vemulapalli, V. Monga, and S. N. Brennan, “Robust extrema features for time-series data analysis,” Pattern Anal. Mach. Intell. IEEE Trans., vol. 35, no. 6, pp. 1464–1479, 2013.
[6] Q. Song, J. Ni, and G. Wang, “A fast clustering-based feature subset selection algorithm for high-dimensional data,” Knowl. Data Eng. IEEE Trans., vol. 25, no. 1, pp. 1–14, 2013.
[7] Y. Chen, H. Sampathkumar, B. Luo, and X. Chen, “iLike: Bridging the Semantic Gap in Vertical Image Search by Integrating Text and Visual Features,” IEEE Trans. Knowl. Data Eng., vol. 25, no. 10, pp. 2257–2270, Oct. 2013.
[8] S. Yin and X. Zhu, “Intelligent Particle Filter and Its Application on Fault Detection of Nonlinear System,” IEEE Trans. Ind. Electron., pp. 1–1, 2015.
[9] S. Yin, X. Zhu, and O. Kaynak, “Improved PLS Focused on Key-Performance-Indicator-Related Fault Diagnosis,” IEEE Trans. Ind. Electron., vol. 62, no. 3, pp. 1651–1658, Mar. 2015.
[10] V. Athitsos, J. Alon, S. Sclaroff, and G. Kollios, “BoostMap: An Embedding Method for Efficient Nearest Neighbor Retrieval,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 1, pp. 89–104, Jan. 2008.
[11] S. D. Bay and M. Schwabacher, “Mining distance-based outliers in near linear time with randomization and a simple pruning rule,” in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’03, 2003, p. 29.
[12] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “LOF:identifyingdensity-basedlocal outliers,” in Proceedings of the 2000 ACM SIGMOD international conference on Management of data - SIGMOD ’00, 2000, pp. 93–104.
[13] D. O. Loftsgaarden and C. P. Quesenberry, “A nonparametric estimate of a multivariate density function,” Annals of Mathematical Statistics, vol. 36, no. 3. pp. 1049–1051, 1965.
[14] P. Ciaccia, M. Patella, and P. Zezula, “M-tree: An Efficient Access Method for Similarity Search in Metric Spaces,” 23rd VLDB Conf., pp. 426–435, 1997.
[15] A. Beygelzimer, S. Kakade, and J. Langford, “Cover trees for nearest neighbor,” ICML ’06 Proc. 23rd Int. Conf. Mach. Learn., pp. 97–104, 2006.
[16] A. Faroughi, R. Javidan, and M. Emami, “A new density estimator based on nearest and farthest neighbor,” in 2016 8th International Symposium on Telecommunications (IST), 2016, pp. 185–190.
[17] I. Jolliffe, “Principal Component Analysis,” in Wiley StatsRef: Statistics Reference Online, Chichester, UK: John Wiley & Sons, Ltd, 2014.
[18] H. Haddad Pajouh, R. Javidan, R. Khayami, D. Ali, and K.-K. R. Choo, “A Two-layer Dimension Reduction and Two-tier Classification Model for Anomaly-Based Intrusion Detection in IoT Backbone Networks,” IEEE Trans. Emerg. Top. Comput., pp. 1–1, 2016.
[19] A. F. and A. Asuncion, “UCI Machine Learning Repository,” Irvine, CA: Univ. California, Irvine, 2010. [Online]. Available: http://archive.ics.
[20] A. K. Jain, “Data clustering: 50 years beyond K-means,” Pattern Recognit. Lett., vol. 31, no. 8, pp. 651–666, Jun. 2010.
[21] J. Wang and X. Su, “An improved K-Means clustering algorithm,” in 2011 IEEE 3rd International Conference on Communication Software and Networks, 2011, pp. 44–46.
[22] R. T. Ng and Jiawei Han, “CLARANS: a method for clustering objects for spatial data mining,” IEEE Trans. Knowl. Data Eng., vol. 14, no. 5, pp. 1003–1016, Sep. 2002.
[23] S. Guha, R. Rastogi, and K. Shim, “Cure: an efficient clustering algorithm for large databases,” Inf. Syst., vol. 26, no. 1, pp. 35–58, Mar. 2001.
[24] G. Karypis, Eui-Hong Han, and V. Kumar, “Chameleon: hierarchical clustering using dynamic modeling,” Computer (Long. Beach. Calif)., vol. 32, no. 8, pp. 68–75, 1999.
[25] X. X. Martin Ester, Hans-Peter Kriegel, Jörg Sander, “A Density Based Notion of Clusters in Large Spatial Databases with Noise,” Int. Conf. Knowl. Discov. Data Min., pp. 226–231, 1996.
[26] Zhong Wang, Yanling Hao, Zhilan Xiong, and Feng Sun, “SNN clustering kernel technique for content-based scene matching,” in 2008 7th IEEE International Conference on Cybernetic Intelligent Systems, 2008, pp. 1–6.
[27] E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, I. Müller-Gorman, and A. Zimek, “Detection and Visualization of Subspace Cluster Hierarchies,” in Advances in Databases: Concepts, Systems and Applications, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 152–163.
[28] Yizong Cheng, “Mean shift, mode seeking, and clustering,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 17, no. 8, pp. 790–799, 1995.
[29] K. Fukunaga and L. Hostetler, “The estimation of the gradient of a density function, with applications in pattern recognition,” IEEE Trans. Inf. Theory, vol. 21, no. 1, pp. 32–40, Jan. 1975.
[30] M. R. Parsaei, R. Taheri and R. Javidan “Perusing The Effect of Discretization of Data on Accuracy of Predicting Naïve Bayes Algorithm,” Curr. Res. Sci., no. January, pp. 457–462, 2016.
[31] A. Gionis, H. Mannila, and P. Tsaparas, “Clustering aggregation,” Proc. - Int. Conf. Data Eng., vol. 1, no. 1, pp. 341–352, 2005.
[32] H. Chang and D.-Y. Yeung, “Robust path-based spectral clustering,” Pattern Recognit., vol. 41, no. 1, pp. 191–203, Jan. 2008.
[33] A. K. Jain and M. H. C. Law, “Data Clustering: A User’s Dilemma,” in Lecture Notes in Computer Science, 2005, pp. 1–10.
[34] L. Fu and E. Medico, “No Title,” BMC Bioinformatics, vol. 8, no. 1, p. 3, 2007.
[35] C. J. Veenman, M. J. T. Reinders, and E. Backer, “A maximum variance cluster algorithm,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 9, pp. 1273–1280, Sep. 2002.
[36] M. R. Parsaei, S. M. Rostami, and R. Javidan, “A Hybrid Data Mining Approach for Intrusion Detection on Imbalanced NSL-KDD Dataset,” Int. J. Adv. Comput. Sci. Appl., vol. 7, no. 6, pp. 20–25, 2016.
Published
2017-06-30
How to Cite
Faroughi, A., & Javidan, R. (2017, June 30). A Novel Density based Clustering Method using Nearest and Farthest Neighbor with PCA. International Journal of Information & Communication Technology Research, 9(2), 27-34. Retrieved from http://journal.itrc.ac.ir/index.php/ijictr/article/view/8
Section
Information Technology