Skip to main content

Fast Rare Category Detection Using Nearest Centroid Neighborhood

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9931))

Abstract

Rare category detection is an open challenge in data mining. The existing approaches to this problem often have some flaws, such as inappropriate investigation scopes, high time complexity, and limited applicable conditions, which will degrade their performance and reduce their usability. In this paper, we present FRANC an effective and efficient solution for rare category detection. It adopts an investigation scope based on k-nearest centroid neighbors with an automatically selected k, which helps the algorithm capture the real changes on local densities and data distribution caused by the presence of rare categories. By using our proposed pruning method, the identification of k-nearest centroid neighbors, which is the most computationally expensive step in FRANC, will be much faster for each data example. Extensive experimental results on real data sets demonstrate the effectiveness and efficiency of FRANC.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Frank, A., Asuncion, A.: UCI machine learning repository (2010). http://archive.ics.uci.edu/ml/

  2. Gou, J., Yi, Z., Du, L., Xiong, T.: A local mean-based \(k\)-nearest centroid neighbor classifier. Comput. J. 55(9), 1058–1071 (2012)

    Article  Google Scholar 

  3. He, J., Carbonell, J.: Nearest-neighbor-based active learning for rare category detection. In: NIPS 2007, pp. 633–640 (2007)

    Google Scholar 

  4. He, J., Carbonell, J.: Prior-free rare category detection. In: SDM 2009, pp. 155–163 (2009)

    Google Scholar 

  5. He, J., Liu, Y., Lawrence, R.: Graph-based rare category detection. In: ICDM 2008, pp. 833–838 (2008)

    Google Scholar 

  6. Hospedales, T.M., Gong, S., Xiang, T.: Finding rare classes: active learning with generative and discriminative models. IEEE Trans. Knowl. Data Eng. 25(2), 374–386 (2013)

    Article  Google Scholar 

  7. Huang, H., Gao, Y., Chiew, K., Chen, L., He, Q.: Towards effective and efficient mining of arbitrary shaped clusters. In: ICDE 2014, pp. 28–39 (2014)

    Google Scholar 

  8. Huang, H., He, Q., Chiew, K., Qian, F., Ma, L.: CLOVER: a faster prior-free approach to rare-category detection. Knowl. Inf. Syst. 35(3), 713–736 (2013)

    Article  Google Scholar 

  9. Huang, H., He, Q., He, J., Ma, L.: RADAR: rare category detection via computation of boundary degree. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part II. LNCS, vol. 6635, pp. 258–269. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  10. Liu, Z., Chiew, K., He, Q., Huang, H., Huang, B.: Prior-free rare category detection: more effective and efficient solutions. Expert Syst. Appl. 41(17), 7691–7706 (2014)

    Article  Google Scholar 

  11. Pelleg, D., Moore, A.W.: Active learning for anomaly and rare-category detection. In: NIPS 2004, pp. 1073–1080 (2004)

    Google Scholar 

  12. Scott, D.W.: Histogram. WIREs Comput. Stat. 2(1), 44–48 (2010)

    Article  Google Scholar 

  13. Vatturi, P., Wong, W.: Category detection using hierarchical mean shift. In: KDD 2009, pp. 847–856 (2009)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by NSFC Grants (61502347, 61522208, 61572376, 61303025, 61379033, and 61232002), the Fundamental Research Funds for the Central Universities (2015XZZX005-07, 2015XZZX004-18, and 2042015kf0038), the Research Funds for Introduced Talents of Wuhan University, and the International Academic Cooperation Training Program of Wuhan University.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Hao Huang or Zhiyong Peng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Wang, S., Huang, H., Gao, Y., Qian, T., Hong, L., Peng, Z. (2016). Fast Rare Category Detection Using Nearest Centroid Neighborhood. In: Li, F., Shim, K., Zheng, K., Liu, G. (eds) Web Technologies and Applications. APWeb 2016. Lecture Notes in Computer Science(), vol 9931. Springer, Cham. https://doi.org/10.1007/978-3-319-45814-4_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45814-4_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45813-7

  • Online ISBN: 978-3-319-45814-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics