ABSTRACT
Uncertainty is often inherent to data and still there are just a few data mining algorithms that handle it. In this paper we focus on how to account for uncertainty in classification algorithms, in particular when data attributes should not be considered completely truthful for classifying a given sample. Our starting point is that each piece of data comes from a potentially different context and, by estimating context probabilities of an unknown sample, we may derive a weight that quantifies their influence. We propose a lazy classification strategy that incorporates the uncertainty into both the training and usage of classifiers. We also propose uK-NN, an extension of the traditional K-NN that implements our approach. Finally, we illustrate uK-NN, which is currently being evaluated experimentally, using a document classification toy example.
- C. C. Aggarwal. On density based transforms for uncertain data mining. In Proc. of ICDE, pages 866--875. IEEE Computer Society, 2007.Google ScholarCross Ref
- C. C. Aggarwal and P. S. Yu. A survey of uncertain data algorithms and applications. IEEE Trans. on Knowledge and Data Engineering, 21(5):609--623, 2009. Google ScholarDigital Library
- J. Bi and T. Zhang. Support vector classification with input data uncertainty. In Proc. Advances in Neural Information Processing Systems (NIPS), pages 161--168, 2004.Google Scholar
- M. Chau, R. Cheng, B. Kao, and J. Ng. Uncertain data mining: An example in clustering location data. In Proc. of 10th PAKDD, pages 199--204, 2006. Google ScholarDigital Library
- C. K. Chui, B. Kao, and E. Hung. Mining frequent itemsets from uncertain data. In Proc. of 11th PAKDD, 2007. Google ScholarDigital Library
- T. Cover and P. Hart. Nearest neighbor pattern classification. Knowledge Based Systems, 8(6):373--389, 1995.Google Scholar
- L. C. da Rocha, F. Mourão, A. M. Pereira, M. A. Gonçalves, and W. Meira Jr. Exploiting temporal contexts in text classification. In CIKM, pages 243--252, 2008. Google ScholarDigital Library
- M. Hua and J. Pei. Cleaning disguised missing data: a heuristic approach. In Proc. of the 13th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 950--958. ACM, 2007. Google ScholarDigital Library
- H.-P. Kriegel and M. Pfeifle. Hierarchical density-based clustering of uncertain data. In Proc. of the 5th ICDM, pages 689--692. IEEE Computer Society, 2005. Google ScholarDigital Library
- A. Niculescu-Mizil and R. Caruana. Predicting good probabilities with supervised learning. In Proc. of the 22nd ICML, pages 625--632, 2005. Google ScholarDigital Library
- B. Qin, Y. Xia, S. Prabhakar, and Y. Tu. A rule-based classification algorithm for uncertain data. In 1st MOUND 2009 at ICDE, 2009. Google ScholarDigital Library
- B. Zadrozny, J. Langford, and N. Abe. Cost-sensitive learning by cost-proportionate example weighting. In Proc. of 3rd ICDM, pages 435--442, 2003. Google ScholarDigital Library
Index Terms
- Exploiting contexts to deal with uncertainty in classification
Recommendations
Uncertainty Quantification for Text Classification
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information RetrievalThis full-day tutorial introduces modern techniques for practical uncertainty quantification specifically in the context of multi-class and multi-label text classification. First, we explain the usefulness of estimating aleatoric uncertainty and ...
Uncertainty-driven ensemble classification exploiting unlabeled data
AbstractThis works investigates the use of margin and diversity, two key concepts in ensemble learning, to develop a versatile uncertainty-driven ensemble classifier, under the scarcity of labeled data. New semi-supervised definitions are proposed for ...
Highlights- A new semi-supervised definition of ensemble margin.
- A new semi-supervised definition of ensemble diversity.
- Original semi-supervised metrics for ensemble performance evaluation.
- A novel decision rule for the fusion of multiple ...
Uncertainty Quantification for Text Classification
Advances in Information RetrievalAbstractThis half-day tutorial introduces modern techniques for practical uncertainty quantification specifically in the context of multi-class and multi-label text classification. First, we explain the usefulness of estimating aleatoric uncertainty and ...
Comments