An Automated Search Space Reduction Methodology for Large Databases

Kuri-Morales, Angel

doi:10.1007/978-3-642-39736-3_2

Angel Kuri-Morales²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7987))

Included in the following conference series:

Industrial Conference on Data Mining

1856 Accesses

Abstract

Given the present need for Customer Relationship and the increased growth of the size of databases, many new approaches to large database clustering and processing have been attempted. In this work we propose a methodology based on the idea that statistically proven search space reduction is possible in practice. Following a previous methodology two clustering models are generated: one corresponding to the full data set and another pertaining to the sampled data set. The resulting empirical distributions were mathematically tested by applying an algorithmic verification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Entropy-Based Approach to Efficient Cleaning of Big Data in Hierarchical Databases

Statistical Data Generation Using Sample Data

Mutual information algorithms for optimal attribute selection in data driven partitions of databases

Article 04 June 2018

References

Palpanas, T.: Knowledge Discovery in Data Warehouses. ACM SIGMOD Record 29(3), 88–100 (2000)
Article Google Scholar
Silva, D.R., Pires, M.T.: Using Data Warehouse and Data Mining Resources for Ongoing Assessment of Distance Learning. In: IEEE ICALT Proceedings (2002)
Google Scholar
Jain, K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31(3), 264–323 (1999)
Article Google Scholar
Berkhin, P.: Survey of Clustering Data Mining Techniques. Accrue Software Inc. (2002)
Google Scholar
Kleinberg, J., Papadimitriou, C., Raghavan, P.: Segmentation Problems. Journal of the ACM 51(2), 263–280 (2004)
Article MathSciNet Google Scholar
Guha, S., Rastogi, R., Shim, K.: CURE: An efficient clustering algorithm for Large Databases. In: ACM SIGMOD Proceedings, pp. 73–84 (1998)
Google Scholar
Peter, W., Chiochetti, J., Giardina, C.: New unsupervised clustering algorithm for large datasets. In: ACM SIGKDD Proceedings, pp. 643–648 (2003)
Google Scholar
Raymong, T.N., Jiawei, H.: Efficient and Effective Clustering Methods for Spatial Data Mining. In: 20th International Conference on Very Large Data Bases, pp. 144–155 (1994)
Google Scholar
Cheng, D., Kannan, R., Vempala, S., Wang, G.: A Divide-and-Merge Methodology for Clustering. In: ACM SIGMOD Proceedings, pp. 196–205 (2005)
Google Scholar
Jagadish, H.V., Lakshmanan, L.V., Srivastava, D.: Snakes and Sandwiches: Optimal Clustering Strategies for a Data Warehouse. In: ACM SIGMOD Proceedings, pp. 37–48 (1999)
Google Scholar
Palmer, C.R., Faloutsos, C.: Density Biased Sampling: An Improved Method for Data Mining and Clustering. In: ACM SIGMOD Record, pp. 82–92 (2000)
Google Scholar
Liu, H., Motoda, H.: On Issues of Instance Selection. Data Mining and Knowledge Discovery 6(2), 115–130 (2002)
Article MathSciNet Google Scholar
Zhu, X., Wu, X.: Scalable Representative Instance Selection and Ranking. In: Proceedings of the 18th IEEE International Conference on Pattern Recognition, pp. 352–355 (2006)
Google Scholar
Brighton, H., Mellish, C.: Advances in Instance Selection for Instance-Based Learning Algorithms. Data Mining and Knowledge Discovery 6, 153–172 (2002)
Article MathSciNet MATH Google Scholar
Vu, K., Hua, K.A., Cheng, H., Lang, S.: A Non-Linear Dimensionality-Reduction Technique for Fast Similarity Search in Large Databases. In: ACM SIGMOD Proceedings, pp. 527–538 (2006)
Google Scholar
Zhang, D., Zhou, Z., Chen, S.: Semi-Supervised Dimensionality Reduction. In: Proceedings of the SIAM International Conference on Data Mining (2007)
Google Scholar
Fodor, I.K.: A survey of dimension reduction techniques. U. S. Department of Energy, Lawrence Livermore National Laboratory (2002)
Google Scholar
Hair, J.F., Anderson, R.E., Tatham, R.L., Black, W.C.: Análisis Multivariante, 5th edn., pp. 11–15. Pearson Prentice Hall, Madrid (1999)
Google Scholar
Delmater, R., Hancock, M.: Data Mining Explained: A Manager’s Guide to Customer-Centric Business Intelligence, ch. 6. Digital press (2001)
Google Scholar
Bezdek, J.C.: Cluster Validity with Fuzzy Sets. Journal of Cybernetics (3), 58–72 (1974)
Google Scholar
Kuri-Morales, A., Erazo-Rodríguez, F.: A Search Space Reduction Methodology for Data Mining in Large Databases. Engineering Applications of Artificial Intelligence, 57–65 (February 1, 2009) ISSN: 0952-1976
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Computación, Instituto Tecnológico Autónomo de México, Mexico
Angel Kuri-Morales

Authors

Angel Kuri-Morales
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, IBaI, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kuri-Morales, A. (2013). An Automated Search Space Reduction Methodology for Large Databases. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2013. Lecture Notes in Computer Science(), vol 7987. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39736-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-39736-3_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39735-6
Online ISBN: 978-3-642-39736-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Automated Search Space Reduction Methodology for Large Databases

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Entropy-Based Approach to Efficient Cleaning of Big Data in Hierarchical Databases

Statistical Data Generation Using Sample Data

Mutual information algorithms for optimal attribute selection in data driven partitions of databases

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

An Automated Search Space Reduction Methodology for Large Databases

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Entropy-Based Approach to Efficient Cleaning of Big Data in Hierarchical Databases

Statistical Data Generation Using Sample Data

Mutual information algorithms for optimal attribute selection in data driven partitions of databases

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation