research-article

Exploiting contexts to deal with uncertainty in classification

Authors:
Bianca Zadrozny

Fluminense Fed. Univ., Niterói, Brazil

Fluminense Fed. Univ., Niterói, Brazil
View Profile

,
Gisele L. Pappa

Fed. Univ. of Minas Gerais, Belo Horizonte, Brazil

Fed. Univ. of Minas Gerais, Belo Horizonte, Brazil
View Profile

,
Wagner Meira

Fed. Univ. of Minas Gerais, Belo Horizonte, Brazil

Fed. Univ. of Minas Gerais, Belo Horizonte, Brazil
View Profile

,
Marcos André Gonçalves

Fed. Univ. of Minas Gerais, Belo Horizonte, Brazil

Fed. Univ. of Minas Gerais, Belo Horizonte, Brazil
View Profile

,
Leonardo Rocha

Fed. Univ. São João Del Rei, São João Del Rei, Brazil

Fed. Univ. São João Del Rei, São João Del Rei, Brazil
View Profile

,
Thiago Salles

Fed. Univ. of Minas Gerais, Belo Horizonte, Brazil

Fed. Univ. of Minas Gerais, Belo Horizonte, Brazil
View Profile

U '09: Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain DataJune 2009Pages 19–22https://doi.org/10.1145/1610555.1610558

Published:28 June 2009Publication History

U '09: Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data

Pages 19–22

ABSTRACT

Uncertainty is often inherent to data and still there are just a few data mining algorithms that handle it. In this paper we focus on how to account for uncertainty in classification algorithms, in particular when data attributes should not be considered completely truthful for classifying a given sample. Our starting point is that each piece of data comes from a potentially different context and, by estimating context probabilities of an unknown sample, we may derive a weight that quantifies their influence. We propose a lazy classification strategy that incorporates the uncertainty into both the training and usage of classifiers. We also propose uK-NN, an extension of the traditional K-NN that implements our approach. Finally, we illustrate uK-NN, which is currently being evaluated experimentally, using a document classification toy example.

References

C. C. Aggarwal. On density based transforms for uncertain data mining. In Proc. of ICDE, pages 866--875. IEEE Computer Society, 2007.Google ScholarCross Ref
C. C. Aggarwal and P. S. Yu. A survey of uncertain data algorithms and applications. IEEE Trans. on Knowledge and Data Engineering, 21(5):609--623, 2009. Google ScholarDigital Library
J. Bi and T. Zhang. Support vector classification with input data uncertainty. In Proc. Advances in Neural Information Processing Systems (NIPS), pages 161--168, 2004.Google Scholar
M. Chau, R. Cheng, B. Kao, and J. Ng. Uncertain data mining: An example in clustering location data. In Proc. of 10th PAKDD, pages 199--204, 2006. Google ScholarDigital Library
C. K. Chui, B. Kao, and E. Hung. Mining frequent itemsets from uncertain data. In Proc. of 11th PAKDD, 2007. Google ScholarDigital Library
T. Cover and P. Hart. Nearest neighbor pattern classification. Knowledge Based Systems, 8(6):373--389, 1995.Google Scholar
L. C. da Rocha, F. Mourão, A. M. Pereira, M. A. Gonçalves, and W. Meira Jr. Exploiting temporal contexts in text classification. In CIKM, pages 243--252, 2008. Google ScholarDigital Library
M. Hua and J. Pei. Cleaning disguised missing data: a heuristic approach. In Proc. of the 13th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 950--958. ACM, 2007. Google ScholarDigital Library
H.-P. Kriegel and M. Pfeifle. Hierarchical density-based clustering of uncertain data. In Proc. of the 5th ICDM, pages 689--692. IEEE Computer Society, 2005. Google ScholarDigital Library
A. Niculescu-Mizil and R. Caruana. Predicting good probabilities with supervised learning. In Proc. of the 22nd ICML, pages 625--632, 2005. Google ScholarDigital Library
B. Qin, Y. Xia, S. Prabhakar, and Y. Tu. A rule-based classification algorithm for uncertain data. In 1st MOUND 2009 at ICDE, 2009. Google ScholarDigital Library
B. Zadrozny, J. Langford, and N. Abe. Cost-sensitive learning by cost-proportionate example weighting. In Proc. of 3rd ICDM, pages 435--442, 2003. Google ScholarDigital Library

Index Terms

Exploiting contexts to deal with uncertainty in classification
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information systems applications

Recommendations

Uncertainty Quantification for Text Classification
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

This full-day tutorial introduces modern techniques for practical uncertainty quantification specifically in the context of multi-class and multi-label text classification. First, we explain the usefulness of estimating aleatoric uncertainty and ...
Read More
Uncertainty-driven ensemble classification exploiting unlabeled data
Abstract
This works investigates the use of margin and diversity, two key concepts in ensemble learning, to develop a versatile uncertainty-driven ensemble classifier, under the scarcity of labeled data. New semi-supervised definitions are proposed for ...
Highlights
- A new semi-supervised definition of ensemble margin.
- A new semi-supervised definition of ensemble diversity.
- Original semi-supervised metrics for ensemble performance evaluation.
- A novel decision rule for the fusion of multiple ...
Read More
Uncertainty Quantification for Text Classification
Advances in Information Retrieval
Abstract
This half-day tutorial introduces modern techniques for practical uncertainty quantification specifically in the context of multi-class and multi-label text classification. First, we explain the usefulness of estimating aleatoric uncertainty and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
U '09: Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data
June 2009
66 pages
ISBN:9781605586755
DOI:10.1145/1610555
Editors:
Jian Pei
Simon Fraser University
,
Lise Getoor
University of Maryland College Park
,
Ander de Keijzer
University of Twente
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 169
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Exploiting contexts to deal with uncertainty in classification

U '09: Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Uncertainty Quantification for Text Classification

Uncertainty-driven ensemble classification exploiting unlabeled data

Uncertainty Quantification for Text Classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Exploiting contexts to deal with uncertainty in classification

U '09: Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Uncertainty Quantification for Text Classification

Uncertainty-driven ensemble classification exploiting unlabeled data

Uncertainty Quantification for Text Classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media