Record-level peculiarity-based data analysis and classifications

Yang, Jian; Zhong, Ning; Yao, Yiyu; Wang, Jue

doi:10.1007/s10115-010-0315-y

Record-level peculiarity-based data analysis and classifications

Regular paper
Published: 01 July 2010

Volume 28, pages 149–173, (2011)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Jian Yang^1,2,
Ning Zhong^1,3,
Yiyu Yao^1,4 &
…
Jue Wang²

111 Accesses
3 Citations
Explore all metrics

Abstract

Peculiarity-oriented mining is a data mining method consisting of peculiar data identification and peculiar data analysis. Peculiarity factor and local peculiarity factor are important concepts employed to describe the peculiarity of a data point in the identification step. One can study the notions at both attribute and record levels. In this paper, a new record LPF called distance-based record LPF (D-record LPF) is proposed, which is defined as the sum of distances between a point and its nearest neighbors. The authors prove that D-record LPF can characterize the probability density of a continuous m-dimensional distribution accurately. This provides a theoretical basis for some existing distance-based anomaly detection techniques. More importantly, it also provides an effective method for describing the class-conditional probabilities in a Bayesian classifier. The result enables us to apply D-record LPF to solve classification problems. A novel algorithm called LPF-Bayes classifier and its kernelized implementation are proposed, which have some connection to the Bayesian classifier. Experimental results on several benchmark datasets demonstrate that the proposed classifiers are competitive to some excellent classifiers such as AdaBoost, support vector machines and kernel Fisher discriminant.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

References

Abe N, Zadrozny B (2006) Outlier detection by active learning. In: Proceeding of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 504–509
Angiulli F, Pizzuti C (2002) Fast outlier detection in high dimensional spaces. In: Proceeding of the 6th European conference on principles of data mining and knowledge discovery, pp 15–26
Aouad LM, Le-Khac N-A, Kechadi TM (2010) Performance study of distributed apriori-like frequent itemsets mining. Knowl Inf Syst 23(1): 55–72
Article Google Scholar
Bhamidipati NL, Pal SK (2006) Comparing rank-inducing scoring systems. In: Proceeding of the 18th international conference on pattern recognition, pp 300–303
Bishop C (1995) Neural networks for pattern recognition. Oxford University Press, Oxford
Google Scholar
Blumenstock A, Schweiggert F, Müller M, Lanquillon M (2009) Rule cubes for causal investigations. Knowl Inf Syst 18(1): 109–132
Article Google Scholar
Boley M, Grosskreutz H (2009) Approximating the number of frequent sets in dense data. Knowl Inf Syst 21(1): 65–89
Article Google Scholar
Breunig MM, Kriegel HP Ng RT, Sander J (2000) Lof: identifying density-based local outliers. In: Proceeding of the 6th ACM SIGMOD international conference on management of data, pp 93–104
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 14(13): 1501–1558
Google Scholar
Duda R, Hart P, Stork D (2000) Pattern classification, 2nd edn. Wiley, New York
Google Scholar
Eskin E, Arnold A, Prerau M, Portnoy L, Stolfo S (2002) A geometric framework for unsupervised anomaly detection: detecting intrusions in unlabeled data. In: Applications of data mining in computer security
Farago A, Linder T, Lugosi G (1993) Fast nearest-neighbor search in dissimilarity spaces. IEEE Trans Pattern Anal Mach Intell 15: 957–962
Article Google Scholar
He QP, Wang J (2007) Fault detection using the k-nearest neighbor rule for semiconductor manufacturing processes. IEEE Trans Semicond Manuf 24: 345–354
Article MathSciNet Google Scholar
He ZY, Xu XF, Huang ZX, Deng SC (2004) A frequent pattern discovery method for outlier detection. In: Proceedings of the 5th international conference on web-age information management, LNCS 3129, pp 726–732
Freund Y, Schapire R (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1): 119–139
Article MATH MathSciNet Google Scholar
Hald A (1999) On the history of maximum likelihood in relation to inverse probability and least squares. Stat Sci 14(2): 214–222
Article MATH MathSciNet Google Scholar
Karmarkar N (1984) A new polynomial-time algorithm for linear programming. Combinatorica 4: 373–395
Article MATH MathSciNet Google Scholar
Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery and data mining, pp 157–166
McCallum A, Nigam K, Ungar LH (2000) Efficient clustering of high-dimensional data sets with application to reference matching. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, pp 169–178
Mcgarry K (2005) A survey of interestingness measures for knowledge discovery. The Knowl Eng Rev 20: 39–61
Article Google Scholar
Mika S, Rätsch G, Weston J, Schölkopf B, Müller KR (1999) Fisher discriminant analysis with kernels. In: Neural networks for signal processing IX, pp 41–48
Ohshima M, Zhong N, Yao YY, Liu C (2007) Relational peculiarity oriented mining. Data Min and Knowl Discov 15: 249–273
Article MathSciNet Google Scholar
Ohshima M, Zhong N, Yao YY, Murata S (2004) Peculiarity oriented analysis in multi-people tracking images. In: Advances in knowledge discovery and data mining, pp 508–518
Ramaswamy S, Rastogi R, Kyuseok S (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 6th ACM SIGMOD international conference on management of data, pp 427–438
Rätsch G (2001) Robust boosting via convex optimization. PhD thesis, University of Potsdam
Rätsch G, Onoda T, Müller KR (2001) Soft margins for adaboost. Mach Learn 42: 283–320
Article Google Scholar
Schölkopf B, Smola A, Müller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10: 1299–1319
Article Google Scholar
Shen B, Yao M, Wu ZH, Gao YJ (2010) Mining dynamic association rules with comments. Knowl Inf Syst 23(1): 73–98
Article Google Scholar
Silbverschatz A, Tuzhilin A (1996) What makes patterns interesting in knowledge discovery systems. IEEE Trans Know Data Eng 8(6): 970–974
Article Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Yao YY, Zhong N (2002) An analysis of peculiarity oriented data mining. In: Proceedings of the 2002 IEEE international conference on data mining workshop on the foundation of data mining and Discovery, pp 185–188
Yang J, Zhong N, Yao YY, Wang J (2008) Local peculiarity factor and its application in outlier detection. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 776–784
Yang J, Zhong N, Yao YY, Wang J (2009) Peculiarity analysis for classifications. In: Proceedings of the 2009 IEEE international conference on data mining, pp 607–616
Zhang Y (1998) Solving large-scale linear programs by interior-point methods under the matlab environment. Optim Methods Softw 10: 1–31
Article MATH MathSciNet Google Scholar
Zhang B, Srihari SN (2004) Fast k-nearest neighbor classification using cluster-based trees. IEEE Trans Pattern Anal Mach Intell 26(4): 525–528
Article Google Scholar
Zhang J, Wang H (2006) Detecting outlying subspaces for high-dimensional data the new task algorithms and performance. Knowl Inf Syst 10: 333–355
Article Google Scholar
Zhong N, Liu C, Yao YY, Ohshima M, Huang MX, Huang JJ (2004) Relational peculiarity oriented data mining. In: Proceedings of the 2004 IEEE international conference on data mining, pp 575–578
Zhong N, Yao YY, Ohshima M (2003) Peculiarity oriented multi-database mining. IEEE Trans Knowl Data Eng 15: 952–960
Article Google Scholar
Zhong N, Ohshima M, Ohsuga S (2001a) Peculiarity oriented mining and its application for knowledge discovery in amino-acid data. In: advances in knowledge discovery and data mining, pp 260–269
Zhong N, Yao YY, Ohshima M, Ohsuga S (2001b) Interestingness, peculiarity, and multi-database mining. In: Proceedings of the 2001 IEEE international conference on data mining, pp 566–573

Download references

Author information

Authors and Affiliations

International WIC Institute, Beijing University of Technology, 100124, Beijing, China
Jian Yang, Ning Zhong & Yiyu Yao
The Key Laboratory of Complex Systems and Intelligence Science, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Jian Yang & Jue Wang
Department of Life Science and Informatics, Maebashi Institute of Technology, Maebashi, Japan
Ning Zhong
Department of Computer Science, University of Regina, Regina, SK, Canada
Yiyu Yao

Authors

Jian Yang
View author publications
You can also search for this author in PubMed Google Scholar
Ning Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Yiyu Yao
View author publications
You can also search for this author in PubMed Google Scholar
Jue Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jian Yang.

Additional information

This paper extends and improves our previous work published in the Proceedings of the 9th IEEE International Conference on Data Mining [33].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, J., Zhong, N., Yao, Y. et al. Record-level peculiarity-based data analysis and classifications. Knowl Inf Syst 28, 149–173 (2011). https://doi.org/10.1007/s10115-010-0315-y

Download citation

Received: 07 January 2010
Revised: 18 April 2010
Accepted: 11 June 2010
Published: 01 July 2010
Issue Date: July 2011
DOI: https://doi.org/10.1007/s10115-010-0315-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Record-level peculiarity-based data analysis and classifications

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Record-level peculiarity-based data analysis and classifications

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation