Abstract
This paper proposes a new neural network method for classifying uncertain data (UNN). Uncertainty is widely spread in real-world data. Numerous factors lead to data uncertainty including data acquisition device error, approximate measurement, sampling fault, transmission latency, data integration error and so on. The performance and quality of data mining results are largely dependent on whether data uncertainty are properly modeled and processed. In this paper, we focus on one commonly encountered type of data uncertainty - the exact data value is unavailable and we only know the probability distribution of the data. An intuitive method of handling this type of uncertainty is to represent the uncertain range by its expectation value, and then process it as certain data. This method, although simple and straightforward, may cause valuable information loss. In this paper, we extend the conventional neural networks classifier so that it can take not only certain data but also uncertain probability distribution as the input. We start with designing uncertain perceptron in linear classification, and analyze how neurons use the new activation function to process data distribution as inputs. We then illustrate how perceptron generates classification principles upon the knowledge learned from uncertain training data. We also construct a multilayer neural network as a general classifier, and propose an optimization technique to accelerate the training process. Experiment shows that UNN performs well even for highly uncertain data and it significantly outperformed the naïve neural network algorithm. Furthermore, the optimization approach we proposed can greatly improve the training efficiency.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Yu, P.: A framework for clustering uncertain data streams. In: IEEE International Conference on Data Engineering, ICDE (2008)
Cormode, G., McGregor, A.: Approximation algorithms for clustering uncertain data. In: Principle of Data base System, PODS (2008)
Kriegel, H., Pfeifle, M.: Density-based clustering of uncertain data. In: ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 672–677 (2005)
Singh, S., Mayfield, C., Prabhakar, S., Shah, R., Hambrusch, S.: Indexing categorical data with uncertainty. In: IEEE International Conference on Data Engineering (ICDE), pp. 616–625 (2007)
Kriegel, H., Pfeifle, M.: Hierarchical density-based clustering of uncertain data. In: IEEE International Conference on Data Mining (ICDM), pp. 689–692 (2005)
Aggarwal, C.C.: On Density Based Transforms for uncertain Data Mining. In: IEEE International Conference on Data Engineering, ICDE (2007)
Aggarwal, C.C.: A Survey of Uncertain Data Algorithms and Applications. IEEE Transactions on Knowledge and Data Engineering 21(5) (2009)
Agrawal, R., Imielinski, T., Swami, A.N.: Database mining: A performance perspective. IEEE Transactions on Knowledge& Data Engineering (1993)
Chau, M., Cheng, R., Kao, B., Ng, J.: Uncertain data mining: An example in clustering location data. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 199–204. Springer, Heidelberg (2006)
Bi, J., Zhang, T.: Support Vector Machines with Input Data Uncertainty. In: Proceeding of Advances in Neural Information Processing Systems (2004)
Qin, B., Xia, Y., Li, F.: DTU: A Decision Tree for Classifying Uncertain Data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 4–15. Springer, Heidelberg (2009)
Cheng, S.-S., Fu, H.-C., Wang, H.-M.: Model-Based Clustering by Probabilistic Self-Organizing Maps. IEEE Transactions on Neural Networks 20(5) (2009)
Kulkarni, A.D., Muniganti, V.K.: Fuzzy Neural Network Models For Clustering. In: ACM Symposium on Applied Computing (1996)
Stan, O., Kamen, E.W.: New block recursive MLP training algorithms using the Levenberg-Marquardt algorithm. Neural Networks 3 (1999)
Matlab: http://www.mathworks.com/
Asuncion, A., Newman, D.: UCI machine learning repository (2007), http://www.ics.uci.edu/mlearn/MLRepository.html
Aggarwal, C.C.: Managing and Mining Uncertain Data. Springer, Heidelberg (2009)
Jang, J.S.R.: ANFIS: Adaptive-network-based fuzzy inference system. IEEE transactions on systems, man, and cybernetics 23, 665–685 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ge, J., Xia, Y., Nadungodage, C. (2010). UNN: A Neural Network for Uncertain Data Classification. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2010. Lecture Notes in Computer Science(), vol 6118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13657-3_48
Download citation
DOI: https://doi.org/10.1007/978-3-642-13657-3_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13656-6
Online ISBN: 978-3-642-13657-3
eBook Packages: Computer ScienceComputer Science (R0)