research-article

Far Point Algorithm: Active Semi-supervised Clustering for Rare Category Detection

Authors:
Rohan Loveland

Department of Computer Science, Whitman College, Walla, WA USA

Department of Computer Science, Whitman College, Walla, WA USA
View Profile

,
Jonathan Amdahl

Klipsch School of Electrical and Computer Engineering, New Mexico State University, Las Cruces, NM USA

Klipsch School of Electrical and Computer Engineering, New Mexico State University, Las Cruces, NM USA
View Profile

ICVISP 2019: Proceedings of the 3rd International Conference on Vision, Image and Signal ProcessingAugust 2019Article No.: 68Pages 1–5https://doi.org/10.1145/3387168.3389117

Published:25 May 2020Publication History

ICVISP 2019: Proceedings of the 3rd International Conference on Vision, Image and Signal Processing

Pages 1–5

ABSTRACT

In some data sets the number of categories (i.e. classes) that are represented is not known in advance. The process of discovering these categories can be difficult, particularly when a data set is skewed, such that the number of data points of some classes may greatly exceed those of other classes. Rare category detection algorithms address this problem by trying to present a user with at least one data point from each category, while minimizing the overall number of data points presented. We present an algorithm based on active and semi-supervised learning that finds category clusters using a query selection strategy that maximizes the distance from a set of already labeled data points to a query data point. We evaluate the algorithm's performance on artificially skewed versions of the MNIST data set as a rare category detection algorithm, investigating differences in performance due to both the effects of relative frequency and inherent class structure differences in feature space.

References

B. Settles, "Active learning," Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 6, no. 1, pp. 1--114, 2012.Google ScholarCross Ref
D. Pelleg and A. W. Moore, "Active learning for anomaly and rarecategory detection," in Advances in neural information processing systems, 2005, pp. 1073--1080.Google Scholar
E. Bair, "Semi-supervised clustering methods," Wiley Interdisciplinary Reviews: Computational Statistics, vol. 5, no. 5, pp. 349--361, 2013.Google ScholarDigital Library
J. He, Analysis of rare categories. Springer Science & Business Media, 2012.Google ScholarCross Ref
J. He and J. G. Carbonell, "Nearest-neighbor-based active learning for rare category detection," in Advances in neural information processing systems, 2008, pp. 633--640.Google Scholar
K. Wagstaff, C. Cardie, S. Rogers, S. Schrodl et al., "Constrained kmeans clustering with background knowledge," in Icml, vol. 1, 2001, pp. 577--584.Google ScholarDigital Library
O. Chapelle, B. Scholkopf, and A. Zien, "Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]," IEEE Transactions on Neural Networks, vol. 20, no. 3, pp. 542--542, 2009.Google ScholarDigital Library
R. Loveland, "farpoint," https://github.com/rohan-loveland/farpoint, 2019.Google Scholar
S. Basu, A. Banerjee, and R. J. Mooney, "Active semi-supervision for pairwise constrained clustering," in Proceedings of the 2004 SIAM international conference on data mining. SIAM, 2004, pp. 333--344.Google Scholar
S. Dasgupta and D. Hsu, "Hierarchical sampling for active learning," in Proceedings of the 25th international conference on Machine learning. ACM, 2008, pp. 208--215.Google Scholar
T. Van Craenendonck, S. Dumancič, E. Van Wolputte, and H. Blockeel,' "Cobras: Fast, iterative, active clustering with pairwise constraints," arXiv preprint arXiv:1803.11060, 2018.Google Scholar
U. Von Luxburg, R. C. Williamson, and I. Guyon, "Clustering: Science or art?" in Proceedings of ICML Workshop on Unsupervised and Transfer Learning, 2012, pp. 65--79.Google Scholar
Y. LeCun and C. Cortes, "MNIST handwritten digit database," 2010. [Online]. Available: http://yann.lecun.com/exdb/mnist/Google Scholar

Index Terms

Far Point Algorithm: Active Semi-supervised Clustering for Rare Category Detection
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Anomaly detection

Recommendations

Far efficient K-means clustering algorithm
ICACCI '12: Proceedings of the International Conference on Advances in Computing, Communications and Informatics

Clustering in data analysis means data with similar features are grouped together within a particular valid cluster. Each cluster consists of data that are more similar among themselves and dissimilar to data of other clusters. Clustering can be viewed ...
Read More
Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm

In comparison with hard clustering methods, in which a pattern belongs to a single cluster, fuzzy clustering algorithms allow patterns to belong to all clusters with differing degrees of membership. This is important in domains such as sentence ...
Read More
Using the stability of objects to determine the number of clusters in datasets

A novel method for assessing the stability of objects and clusters is presented.The new method is based on multiple runs of a partitioning algorithm.It can be used to determine the number of clusters in complex datasets.The introduced stability indices ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICVISP 2019: Proceedings of the 3rd International Conference on Vision, Image and Signal Processing
August 2019
584 pages
ISBN:9781450376259
DOI:10.1145/3387168

Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 May 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
detection algorithms
machine learning algorithms
partitioning algorithms
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
ICVISP 2019 Paper Acceptance Rate126of277submissions,45%Overall Acceptance Rate186of424submissions,44%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 50
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Far Point Algorithm: Active Semi-supervised Clustering for Rare Category Detection

ICVISP 2019: Proceedings of the 3rd International Conference on Vision, Image and Signal Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Far efficient K-means clustering algorithm

Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm

Using the stability of objects to determine the number of clusters in datasets

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Far Point Algorithm: Active Semi-supervised Clustering for Rare Category Detection

ICVISP 2019: Proceedings of the 3rd International Conference on Vision, Image and Signal Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Far efficient K-means clustering algorithm

Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm

Using the stability of objects to determine the number of clusters in datasets

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media