skip to main content
10.1145/1015330.1015399acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

A needle in a haystack: local one-class optimization

Published: 04 July 2004 Publication History

Abstract

This paper addresses the problem of finding a small and coherent subset of points in a given data. This problem, sometimes referred to as one-class or set covering, requires to find a small-radius ball that covers as many data points as possible. It rises naturally in a wide range of applications, from finding gene-modules to extracting documents' topics, where many data points are irrelevant to the task at hand, or in applications where only positive examples are available. Most previous approaches to this problem focus on identifying and discarding a possible set of outliers. In this paper we adopt an opposite approach which directly aims to find a small set of coherently structured regions, by using a loss function that focuses on local properties of the data. We formalize the learning task as an optimization problem using the Information-Bottleneck principle. An algorithm to solve this optimization problem is then derived and analyzed. Experiments on gene expression data and a text document corpus demonstrate the merits of our approach.

References

[1]
Alizadeh, A., & et al. (2000). Distinct types of diffuse large b-cell lymphoma identified by gene-expression profiling. Nature, 4051, 503--511.
[2]
Ben-Hur, A., Horn, D., Siegelmann, H. T., & Vapnik, V. (2001). Support vector clustering. J. of Mach. Learn. Res., 2, 125--137.
[3]
Censor, Y., & Zenios, S. (1997). Parallel optimization: Theory, algorithms, and applications. Oxford Univ. Press, NY, USA.
[4]
Crammer, K., & Singer, Y. (2003). Learning algorithms for enclosing points in bregmanian spheres. Proceedings of the Sixteenth Annual Conference on Computational Learning Theory.
[5]
Crammer, K., & Slonim, N. (2003). Bregman information bottleneck. Presentation at the Workshop on Information Bottleneck and Information Distortion, Neural Information Processing Systems, Vancouver, Canada.
[6]
Itakura, F., & Saito, S. (1970). A statistical method for estimation of speech spectral density and formant frequencies. Electronics & Communications in Japan, 53, 36--43.
[7]
Schölkopf, B., Burges, C., & Vapnik, V. (1995). Extracting support data for a given task. First International Conference on Knowledge Discovery & Data Mining (KDD). AAAI Press.
[8]
Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A. J., & Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural Computation, 13, 1443--1472.
[9]
Slonim, N. (2002). The information bottleneck: Theory and applications. Doctoral dissertation, The Hebrew University.
[10]
Tax, D., & Duin, R. (1999). Data domain description using support vectors. Proceedings of the European Symposium on Artificial Neural Networks (pp. 251--256).
[11]
Tishby, N., Pereira, F., & Bialek, W. (1999). The information bottleneck method. The 37'th Allerton Conference on Communication, Control, and Computing. Allerton House, Illinois.

Cited By

View all
  • (2024)Anomaly Detection Based on Compressed Data: An Information Theoretic CharacterizationIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2023.329916954:1(23-38)Online publication date: Jan-2024
  • (2024)CoMadOut—a robust outlier detection algorithm based on CoMADMachine Learning10.1007/s10994-024-06521-2113:10(8061-8135)Online publication date: 7-May-2024
  • (2023)Towards Understanding Alerts raised by Unsupervised Network Intrusion Detection SystemsProceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3607199.3607247(135-150)Online publication date: 16-Oct-2023
  • Show More Cited By
  1. A needle in a haystack: local one-class optimization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICML '04: Proceedings of the twenty-first international conference on Machine learning
    July 2004
    934 pages
    ISBN:1581138385
    DOI:10.1145/1015330
    • Conference Chair:
    • Carla Brodley
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 July 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate 140 of 548 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Anomaly Detection Based on Compressed Data: An Information Theoretic CharacterizationIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2023.329916954:1(23-38)Online publication date: Jan-2024
    • (2024)CoMadOut—a robust outlier detection algorithm based on CoMADMachine Learning10.1007/s10994-024-06521-2113:10(8061-8135)Online publication date: 7-May-2024
    • (2023)Towards Understanding Alerts raised by Unsupervised Network Intrusion Detection SystemsProceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3607199.3607247(135-150)Online publication date: 16-Oct-2023
    • (2023)Fast Bayesian optimization of Needle-in-a-Haystack problems using zooming memory-based initialization (ZoMBI)npj Computational Materials10.1038/s41524-023-01048-x9:1Online publication date: 26-May-2023
    • (2021)MultiKOC: Multi-One-Class Classifier Based K-Means ClusteringAlgorithms10.3390/a1405013414:5(134)Online publication date: 23-Apr-2021
    • (2020)One-Class Classification for Selecting Synthetic Datasets in Meta-Learning2020 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN48605.2020.9206899(1-8)Online publication date: Jul-2020
    • (2020)One-Class Novelty Detection via Sparse Representation with Contrastive Deep Features2020 International Computer Symposium (ICS)10.1109/ICS51289.2020.00022(61-66)Online publication date: Dec-2020
    • (2019)K – Means Based One-Class SVM ClassifierDatabase and Expert Systems Applications10.1007/978-3-030-27684-3_7(45-53)Online publication date: 1-Aug-2019
    • (2018)Non-Parametric Message Importance Measure: Storage Code Design and Transmission Planning for Big DataIEEE Transactions on Communications10.1109/TCOMM.2018.284766666:11(5181-5196)Online publication date: Nov-2018
    • (2017)Non-parametric message important measure: Compressed storage design for big data in wireless communication systems2017 23rd Asia-Pacific Conference on Communications (APCC)10.23919/APCC.2017.8304050(1-6)Online publication date: Dec-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media