Skip to main content

An Automated Search Space Reduction Methodology for Large Databases

  • Conference paper
Advances in Data Mining. Applications and Theoretical Aspects (ICDM 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7987))

Included in the following conference series:

  • 1827 Accesses

Abstract

Given the present need for Customer Relationship and the increased growth of the size of databases, many new approaches to large database clustering and processing have been attempted. In this work we propose a methodology based on the idea that statistically proven search space reduction is possible in practice. Following a previous methodology two clustering models are generated: one corresponding to the full data set and another pertaining to the sampled data set. The resulting empirical distributions were mathematically tested by applying an algorithmic verification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Palpanas, T.: Knowledge Discovery in Data Warehouses. ACM SIGMOD Record 29(3), 88–100 (2000)

    Article  Google Scholar 

  2. Silva, D.R., Pires, M.T.: Using Data Warehouse and Data Mining Resources for Ongoing Assessment of Distance Learning. In: IEEE ICALT Proceedings (2002)

    Google Scholar 

  3. Jain, K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31(3), 264–323 (1999)

    Article  Google Scholar 

  4. Berkhin, P.: Survey of Clustering Data Mining Techniques. Accrue Software Inc. (2002)

    Google Scholar 

  5. Kleinberg, J., Papadimitriou, C., Raghavan, P.: Segmentation Problems. Journal of the ACM 51(2), 263–280 (2004)

    Article  MathSciNet  Google Scholar 

  6. Guha, S., Rastogi, R., Shim, K.: CURE: An efficient clustering algorithm for Large Databases. In: ACM SIGMOD Proceedings, pp. 73–84 (1998)

    Google Scholar 

  7. Peter, W., Chiochetti, J., Giardina, C.: New unsupervised clustering algorithm for large datasets. In: ACM SIGKDD Proceedings, pp. 643–648 (2003)

    Google Scholar 

  8. Raymong, T.N., Jiawei, H.: Efficient and Effective Clustering Methods for Spatial Data Mining. In: 20th International Conference on Very Large Data Bases, pp. 144–155 (1994)

    Google Scholar 

  9. Cheng, D., Kannan, R., Vempala, S., Wang, G.: A Divide-and-Merge Methodology for Clustering. In: ACM SIGMOD Proceedings, pp. 196–205 (2005)

    Google Scholar 

  10. Jagadish, H.V., Lakshmanan, L.V., Srivastava, D.: Snakes and Sandwiches: Optimal Clustering Strategies for a Data Warehouse. In: ACM SIGMOD Proceedings, pp. 37–48 (1999)

    Google Scholar 

  11. Palmer, C.R., Faloutsos, C.: Density Biased Sampling: An Improved Method for Data Mining and Clustering. In: ACM SIGMOD Record, pp. 82–92 (2000)

    Google Scholar 

  12. Liu, H., Motoda, H.: On Issues of Instance Selection. Data Mining and Knowledge Discovery 6(2), 115–130 (2002)

    Article  MathSciNet  Google Scholar 

  13. Zhu, X., Wu, X.: Scalable Representative Instance Selection and Ranking. In: Proceedings of the 18th IEEE International Conference on Pattern Recognition, pp. 352–355 (2006)

    Google Scholar 

  14. Brighton, H., Mellish, C.: Advances in Instance Selection for Instance-Based Learning Algorithms. Data Mining and Knowledge Discovery 6, 153–172 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  15. Vu, K., Hua, K.A., Cheng, H., Lang, S.: A Non-Linear Dimensionality-Reduction Technique for Fast Similarity Search in Large Databases. In: ACM SIGMOD Proceedings, pp. 527–538 (2006)

    Google Scholar 

  16. Zhang, D., Zhou, Z., Chen, S.: Semi-Supervised Dimensionality Reduction. In: Proceedings of the SIAM International Conference on Data Mining (2007)

    Google Scholar 

  17. Fodor, I.K.: A survey of dimension reduction techniques. U. S. Department of Energy, Lawrence Livermore National Laboratory (2002)

    Google Scholar 

  18. Hair, J.F., Anderson, R.E., Tatham, R.L., Black, W.C.: Análisis Multivariante, 5th edn., pp. 11–15. Pearson Prentice Hall, Madrid (1999)

    Google Scholar 

  19. Delmater, R., Hancock, M.: Data Mining Explained: A Manager’s Guide to Customer-Centric Business Intelligence, ch. 6. Digital press (2001)

    Google Scholar 

  20. Bezdek, J.C.: Cluster Validity with Fuzzy Sets. Journal of Cybernetics (3), 58–72 (1974)

    Google Scholar 

  21. Kuri-Morales, A., Erazo-Rodríguez, F.: A Search Space Reduction Methodology for Data Mining in Large Databases. Engineering Applications of Artificial Intelligence, 57–65 (February 1, 2009) ISSN: 0952-1976

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kuri-Morales, A. (2013). An Automated Search Space Reduction Methodology for Large Databases. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2013. Lecture Notes in Computer Science(), vol 7987. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39736-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39736-3_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39735-6

  • Online ISBN: 978-3-642-39736-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics