skip to main content
10.1145/1610555.1610558acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Exploiting contexts to deal with uncertainty in classification

Published:28 June 2009Publication History

ABSTRACT

Uncertainty is often inherent to data and still there are just a few data mining algorithms that handle it. In this paper we focus on how to account for uncertainty in classification algorithms, in particular when data attributes should not be considered completely truthful for classifying a given sample. Our starting point is that each piece of data comes from a potentially different context and, by estimating context probabilities of an unknown sample, we may derive a weight that quantifies their influence. We propose a lazy classification strategy that incorporates the uncertainty into both the training and usage of classifiers. We also propose uK-NN, an extension of the traditional K-NN that implements our approach. Finally, we illustrate uK-NN, which is currently being evaluated experimentally, using a document classification toy example.

References

  1. C. C. Aggarwal. On density based transforms for uncertain data mining. In Proc. of ICDE, pages 866--875. IEEE Computer Society, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  2. C. C. Aggarwal and P. S. Yu. A survey of uncertain data algorithms and applications. IEEE Trans. on Knowledge and Data Engineering, 21(5):609--623, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Bi and T. Zhang. Support vector classification with input data uncertainty. In Proc. Advances in Neural Information Processing Systems (NIPS), pages 161--168, 2004.Google ScholarGoogle Scholar
  4. M. Chau, R. Cheng, B. Kao, and J. Ng. Uncertain data mining: An example in clustering location data. In Proc. of 10th PAKDD, pages 199--204, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. K. Chui, B. Kao, and E. Hung. Mining frequent itemsets from uncertain data. In Proc. of 11th PAKDD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Cover and P. Hart. Nearest neighbor pattern classification. Knowledge Based Systems, 8(6):373--389, 1995.Google ScholarGoogle Scholar
  7. L. C. da Rocha, F. Mourão, A. M. Pereira, M. A. Gonçalves, and W. Meira Jr. Exploiting temporal contexts in text classification. In CIKM, pages 243--252, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Hua and J. Pei. Cleaning disguised missing data: a heuristic approach. In Proc. of the 13th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 950--958. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H.-P. Kriegel and M. Pfeifle. Hierarchical density-based clustering of uncertain data. In Proc. of the 5th ICDM, pages 689--692. IEEE Computer Society, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Niculescu-Mizil and R. Caruana. Predicting good probabilities with supervised learning. In Proc. of the 22nd ICML, pages 625--632, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. Qin, Y. Xia, S. Prabhakar, and Y. Tu. A rule-based classification algorithm for uncertain data. In 1st MOUND 2009 at ICDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Zadrozny, J. Langford, and N. Abe. Cost-sensitive learning by cost-proportionate example weighting. In Proc. of 3rd ICDM, pages 435--442, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Exploiting contexts to deal with uncertainty in classification

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        U '09: Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data
        June 2009
        66 pages
        ISBN:9781605586755
        DOI:10.1145/1610555

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 28 June 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader