ABSTRACT
This paper introduces mass estimation--a base modelling mechanism in data mining. It provides the theoretical basis of mass and an efficient method to estimate mass. We show that it solves problems very effectively in tasks such as information retrieval, regression and anomaly detection. The models, which use mass in these three tasks, perform at least as good as and often better than a total of eight state-of-the-art methods in terms of task-specific performance measures. In addition, mass estimation has constant time and space complexities.
Supplemental Material
- G. Aloupis. Geometric measures of data depth. DIMACS Series in Discrete Math and Theoretical Computer Science, 72:147--158, 2006.Google ScholarCross Ref
- A. Asuncion and D. Newman. UCI machine learning repository, 2007.Google Scholar
- S. D. Bay and M. Schwabacher. Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In Proceedings of SIGKDD, pages 29--38, 2003. Google ScholarDigital Library
- M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. LOF: Identifying density-based local outliers. In Proceedings of SIGKDD, pages 93--104, 2000. Google ScholarDigital Library
- B. Caputo, K. Sim, F. Furesjo, and A. Smola. Appearance-based object recognition using svms: which kernel should i use? In NIPS workshop on Statitsical methods for computational experiments in visual processing and computer vision, 2002.Google Scholar
- C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001.Google Scholar
- R. Duda, P. Hart, and D. Stork. Pattern Classification. Second Edition. John Wiley, 2001. Google ScholarDigital Library
- G. Giacinto and F. Roli. Instance-based relevance feedback for image retrieval. In Advances in NIPS, pages 489--496, 2005.Google Scholar
- J. He, M. Li, H. Zhang, H. Tong, and C. Zhang. Manifold-ranking based image retrieval. In Proceedings of ACM Multimedia, pages 9--16, 2004. Google ScholarDigital Library
- F. T. Liu, K. M. Ting, and Z.-H. Zhou. Isolation forest. In Proceedings of ICDM, pages 413--422, 2008. Google ScholarDigital Library
- R. Liu, J. M. Parelius, and K. Singh. Multivariate analysis by data depth. The Annals of Statistics, 27(3):783--840, 1999.Google ScholarCross Ref
- D. M. Rocke and D. L. Woodruff. Identification of outliers in multivariate data. Journal of the American Statistical Association, 91(435):1047--1061, 1996.Google ScholarCross Ref
- B. Sch-olkopf, R. C. Williamson, A. J. Smola, J. Shawe-Taylor, and J. C. Platt. Support vector method for novelty detection. In Advances in NIPS, pages 582--588, 2000.Google Scholar
- J. S. Simonoff. Smoothing Methods in Statistics. Springer-Verlag, 1996.Google ScholarCross Ref
- K. M. Ting, S. C. Tan, and F. T. Liu. Mass: A new ranking measure for anomaly detection. Gippsland School of Information Technology, Monash University, Technical Report TR2009/1, 2009.Google Scholar
- V. N. Vapnik. The Nature of Statistical Learning Theory. Second Edition. Springer, 2000.Google Scholar
- G.-T. Zhou, K. M. Ting, F. T. Liu, and Y. Yin. Relevance feature mapping for content-based image retrieval. In Proceedings of Multimedia Data Mining Workshop at KDD, 2010. Google ScholarDigital Library
- Z.-H. Zhou, K.-J. Chen, and H.-B. Dai. Enhancing relevance feedback in image retrieval using unlabeled data. ACM Transactions on Information Systems, 24(2):219--244, 2006. Google ScholarDigital Library
- Z.-H. Zhou and H.-B. Dai. Query-sensitive similarity measure for content-based image retrieval. In Proceedings of ICDM, pages 1211--1215, 2006. Google ScholarDigital Library
Index Terms
- Mass estimation and its applications
Recommendations
Mass estimation
This paper introduces mass estimation--a base modelling mechanism that can be employed to solve various tasks in machine learning. We present the theoretical basis of mass and efficient methods to estimate mass. We show that mass estimation solves ...
Development and assessment of different modeling approaches for size-mass estimation of mango fruits (Mangifera indica L., cv. 'Nam Dokmai')
Three different models (SLR, MLR, ANN) were applied to estimate mango fruit mass.Performances were compared with respect to quality of estimation and robustness.The linear models (SLR, MLR) did not completely apply to solve the fitting problem.The ...
Multi-dimensional Mass Estimation and Mass-based Clustering
ICDM '10: Proceedings of the 2010 IEEE International Conference on Data MiningMass estimation, an alternative to density estimation, has been shown recently to be an effective base modelling mechanism for three data mining tasks of regression, information retrieval and anomaly detection. This paper advances this work in two ...
Comments