Skip to main content

Cluster-By: An Efficient Clustering Operator in Emergency Management Database Systems

  • Conference paper
Web-Age Information Management (WAIM 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7901))

Included in the following conference series:

Abstract

Database management systems (DBMS) have been widely used to efficiently store, manage and analysis large emergency management data. Despite the popularity of clustering as a general data mining method, current emergency management database systems lacked a unified and convenient way to support in-database clustering. In this paper we promote the advantages of integrating clustering into databases and propose a new Cluster-by SQL extension. We formally define the syntax and semantics of the Cluster-by clause, illustrate its query plan node in database engine and present two data preprocessing rules. Then we explore the query optimization opportunities, present a novel framework for multiquery optimization and define the cost model for multi-query scheduling. We also introduce DBSCAN-based Shrink and Expand algorithms to utilize the historical clustering results and present a heuristic cost model. To demonstrate the integration of the extension with existing DBMSs, we implemented the Cluster-by extension in PostgreSQL. We performed experiments on real data sets in PostgreSQL. Results show that Cluster-by extension is useful, the multiquery optimization techniques proposed are efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Oracle Spatial Developer’s Guide 11g (11.1) (2009)

    Google Scholar 

  2. Ester, M., Kriegel, H., Sander, J., Wimmer, M., Xu, X.: Incremental clustering for mining in a data warehousing environment. In: VLDB 1998, pp. 323–333 (1998)

    Google Scholar 

  3. Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise, pp. 226–231. AAAI Press (1996)

    Google Scholar 

  4. Li, F., Liu, S., et al.: An inheritable clustering algorithm suited for parameter changing. In: Proceedings of 2004 International Conference on Machine Learning and Cybernetics, vol. 2, pp. 198–203 (2004)

    Google Scholar 

  5. Frank, R., Jin, W., Ester, M.: Efficiently mining regional outliers in spatial data. In: Papadias, D., Zhang, D., Kollios, G. (eds.) SSTD 2007. LNCS, vol. 4605, pp. 112–129. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  6. Guting, R.H.: An introduction to spatial database systems. VLDB Journal 4, 357–399 (1994)

    Article  Google Scholar 

  7. Kalnis, P., Papadias, D.: Multi-query optimization for on-line analytical processing. Information Systems 278(5), 457–473 (2001)

    Google Scholar 

  8. Li, C., Wang, M., Lim, L., et al.: Supporting ranking and clustering as generalized order-by and group-by. In: SIGMOD 2007, pp. 127–138 (2007)

    Google Scholar 

  9. Li, F.-f., Cheng, D., Hadjieleftheriou, M., Kollios, G., Teng, S.-H.: On trip planning queries in spatial databases. In: Medeiros, C.B., Egenhofer, M., Bertino, E. (eds.) SSTD 2005. LNCS, vol. 3633, pp. 273–290. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  10. Ordonez, C.: Integrating k-means clustering with a relational dbms using sql. IEEE Trans. on Knowl. and Data Eng. 18(2), 188–201 (2006)

    Article  MathSciNet  Google Scholar 

  11. Santos, M.Y., Moreira, A.: Automatic classification of location contexts with decision trees. In: CSMU 2006, pp. 79–88 (2006)

    Google Scholar 

  12. Shekhar, S., Chawla, S., Ravada, S., Fetterer, A., Liu, X., Lu, C.T.: Spatial databases: Accomplishments and research needs. IEEE Transactions on Knowledge and Data Engineering 11, 45–55 (1997)

    Article  Google Scholar 

  13. Silva, Y.N., Aref, E.: Similarity group-by. In: ICDE 2009, pp. 904–915 (2009)

    Google Scholar 

  14. Yan, W., Larson, P.: Interchanging the order of grouping and join. In: Technical report (1995)

    Google Scholar 

  15. Yan, W.P., Larson, P.A.: Eager aggregation and lazy aggregation. In: VLDB 1995, pp. 345–357 (1995)

    Google Scholar 

  16. Zhang, C., Huang, Y.: Cluster by: a new sql extension for spatial data aggregation. In: ACM GIS 2007 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sun, P., Huang, Y., Zhang, C. (2013). Cluster-By: An Efficient Clustering Operator in Emergency Management Database Systems. In: Gao, Y., et al. Web-Age Information Management. WAIM 2013. Lecture Notes in Computer Science, vol 7901. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39527-7_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39527-7_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39526-0

  • Online ISBN: 978-3-642-39527-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics