Efficient Sampling: Application to Image Data

Wang, Surong; Dash, Manoranjan; Chia, Liang-Tien

doi:10.1007/11430919_53

Surong Wang²¹,
Manoranjan Dash²¹ &
Liang-Tien Chia²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3518))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2538 Accesses
4 Citations

Abstract

Sampling is an important preprocessing algorithm that is used to mine large data efficiently. Although a simple random sample often works fine for reasonable sample size, accuracy falls sharply with reduced sample size. In kdd’03 we proposed ease that outputs a sample based on its ‘closeness’ to the original sample. Reported results show that ease outperforms simple random sampling (srs). In this paper we propose easier that extends ease in two ways. 1) ease is a halving algorithm, i.e., to achieve the required sample ratio it starts from a suitable initial large sample and iteratively halves. easier, on the other hand, does away with the repeated halving by directly obtaining the required sample ratio in one iteration. 2) ease was shown to work on ibm quest dataset which is a categorical count dataset. easier, in addition, is shown to work on continuous data such as Color Structure Descriptor of images. Two mining tasks, classification and association rule mining, are used to validate the efficacy of easier samples vis-a-vis ease and srs samples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brönnimann, H., Chen, B., Dash, M., Haas, P., Scheuermann, P.: Efficient data reduction with EASE. In: Proc. 9th Int. Conf. on KDD, pp. 59–68 (2003)
Google Scholar
Chen, B., Haas, P., Scheuermann, P.: A new two-phase sampling based algorithm for discovering association rules. In: Proc. Int. Conf. on ACM SIGKDD (2002)
Google Scholar
Chapelle, O., Halffiner, P., Vapnik, V.N.: Support vector machine for histogram based image classification. IEEE Trans. on Neutral Network 10 (1999)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. Int. Conf. on VLDB (1994)
Google Scholar
ISO/IEC15938-8/FDIS3: Information Technology - Multimedia Content Description Interface - Part 8 (Extraction and use of MPEG-7 descriptions)
Google Scholar
Ojala, T., Aittola, M., Matinmikko, E.: Empirical evaluation of mpeg-7 xm color descriptors in content-based retrieval of semantic image categories. In: Proc. 16th Int. Conf. on Pattern Recognition, pp. 1021–1024 (2002)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proc. Int. Conf. on ACM SIGMOD (2000)
Google Scholar
Jin, R., Yan, R., Hauptmann, A.: Image classification using a bigram model. In: AAAI Spring Symposium on Intelligent Multimedia Knowledge Management (2003)
Google Scholar
Vitter, J.: Random sampling with a reservoir. ACM Trans. Math. Software (1985)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Multimedia and Network Technology, School of Computer Engineering, Nanyang Technological University, 639798, Singapore
Surong Wang, Manoranjan Dash & Liang-Tien Chia

Authors

Surong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Manoranjan Dash
View author publications
You can also search for this author in PubMed Google Scholar
Liang-Tien Chia
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Japan Advanced Institute of Science and Technology, Asahidai 1-1, 923-12292, Nomi, Japan
Tu Bao Ho
University of Hong Kong, Pokfulam Road, Hong Kong, China
David Cheung
Department of Computer Science and Engineering, Arizona State University, Tempe, Arizona, USA
Huan Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, S., Dash, M., Chia, LT. (2005). Efficient Sampling: Application to Image Data. In: Ho, T.B., Cheung, D., Liu, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2005. Lecture Notes in Computer Science(), vol 3518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11430919_53

Download citation

DOI: https://doi.org/10.1007/11430919_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26076-9
Online ISBN: 978-3-540-31935-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics