Skip to main content

Efficient Sampling: Application to Image Data

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3518))

Included in the following conference series:

Abstract

Sampling is an important preprocessing algorithm that is used to mine large data efficiently. Although a simple random sample often works fine for reasonable sample size, accuracy falls sharply with reduced sample size. In kdd’03 we proposed ease that outputs a sample based on its ‘closeness’ to the original sample. Reported results show that ease outperforms simple random sampling (srs). In this paper we propose easier that extends ease in two ways. 1) ease is a halving algorithm, i.e., to achieve the required sample ratio it starts from a suitable initial large sample and iteratively halves. easier, on the other hand, does away with the repeated halving by directly obtaining the required sample ratio in one iteration. 2) ease was shown to work on ibm quest dataset which is a categorical count dataset. easier, in addition, is shown to work on continuous data such as Color Structure Descriptor of images. Two mining tasks, classification and association rule mining, are used to validate the efficacy of easier samples vis-a-vis ease and srs samples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brönnimann, H., Chen, B., Dash, M., Haas, P., Scheuermann, P.: Efficient data reduction with EASE. In: Proc. 9th Int. Conf. on KDD, pp. 59–68 (2003)

    Google Scholar 

  2. Chen, B., Haas, P., Scheuermann, P.: A new two-phase sampling based algorithm for discovering association rules. In: Proc. Int. Conf. on ACM SIGKDD (2002)

    Google Scholar 

  3. Chapelle, O., Halffiner, P., Vapnik, V.N.: Support vector machine for histogram based image classification. IEEE Trans. on Neutral Network 10 (1999)

    Google Scholar 

  4. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. Int. Conf. on VLDB (1994)

    Google Scholar 

  5. ISO/IEC15938-8/FDIS3: Information Technology - Multimedia Content Description Interface - Part 8 (Extraction and use of MPEG-7 descriptions)

    Google Scholar 

  6. Ojala, T., Aittola, M., Matinmikko, E.: Empirical evaluation of mpeg-7 xm color descriptors in content-based retrieval of semantic image categories. In: Proc. 16th Int. Conf. on Pattern Recognition, pp. 1021–1024 (2002)

    Google Scholar 

  7. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proc. Int. Conf. on ACM SIGMOD (2000)

    Google Scholar 

  8. Jin, R., Yan, R., Hauptmann, A.: Image classification using a bigram model. In: AAAI Spring Symposium on Intelligent Multimedia Knowledge Management (2003)

    Google Scholar 

  9. Vitter, J.: Random sampling with a reservoir. ACM Trans. Math. Software (1985)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, S., Dash, M., Chia, LT. (2005). Efficient Sampling: Application to Image Data. In: Ho, T.B., Cheung, D., Liu, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2005. Lecture Notes in Computer Science(), vol 3518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11430919_53

Download citation

  • DOI: https://doi.org/10.1007/11430919_53

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26076-9

  • Online ISBN: 978-3-540-31935-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics