Cluster-By: An Efficient Clustering Operator in Emergency Management Database Systems

Sun, Peng; Huang, Yan; Zhang, Chengyang

doi:10.1007/978-3-642-39527-7_17

Peng Sun²⁴,
Yan Huang²⁵ &
Chengyang Zhang²⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7901))

Included in the following conference series:

International Conference on Web-Age Information Management

1474 Accesses
2 Citations

Abstract

Database management systems (DBMS) have been widely used to efficiently store, manage and analysis large emergency management data. Despite the popularity of clustering as a general data mining method, current emergency management database systems lacked a unified and convenient way to support in-database clustering. In this paper we promote the advantages of integrating clustering into databases and propose a new Cluster-by SQL extension. We formally define the syntax and semantics of the Cluster-by clause, illustrate its query plan node in database engine and present two data preprocessing rules. Then we explore the query optimization opportunities, present a novel framework for multiquery optimization and define the cost model for multi-query scheduling. We also introduce DBSCAN-based Shrink and Expand algorithms to utilize the historical clustering results and present a heuristic cost model. To demonstrate the integration of the extension with existing DBMSs, we implemented the Cluster-by extension in PostgreSQL. We performed experiments on real data sets in PostgreSQL. Results show that Cluster-by extension is useful, the multiquery optimization techniques proposed are efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Oracle Spatial Developer’s Guide 11g (11.1) (2009)
Google Scholar
Ester, M., Kriegel, H., Sander, J., Wimmer, M., Xu, X.: Incremental clustering for mining in a data warehousing environment. In: VLDB 1998, pp. 323–333 (1998)
Google Scholar
Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise, pp. 226–231. AAAI Press (1996)
Google Scholar
Li, F., Liu, S., et al.: An inheritable clustering algorithm suited for parameter changing. In: Proceedings of 2004 International Conference on Machine Learning and Cybernetics, vol. 2, pp. 198–203 (2004)
Google Scholar
Frank, R., Jin, W., Ester, M.: Efficiently mining regional outliers in spatial data. In: Papadias, D., Zhang, D., Kollios, G. (eds.) SSTD 2007. LNCS, vol. 4605, pp. 112–129. Springer, Heidelberg (2007)
Chapter Google Scholar
Guting, R.H.: An introduction to spatial database systems. VLDB Journal 4, 357–399 (1994)
Article Google Scholar
Kalnis, P., Papadias, D.: Multi-query optimization for on-line analytical processing. Information Systems 278(5), 457–473 (2001)
Google Scholar
Li, C., Wang, M., Lim, L., et al.: Supporting ranking and clustering as generalized order-by and group-by. In: SIGMOD 2007, pp. 127–138 (2007)
Google Scholar
Li, F.-f., Cheng, D., Hadjieleftheriou, M., Kollios, G., Teng, S.-H.: On trip planning queries in spatial databases. In: Medeiros, C.B., Egenhofer, M., Bertino, E. (eds.) SSTD 2005. LNCS, vol. 3633, pp. 273–290. Springer, Heidelberg (2005)
Chapter Google Scholar
Ordonez, C.: Integrating k-means clustering with a relational dbms using sql. IEEE Trans. on Knowl. and Data Eng. 18(2), 188–201 (2006)
Article MathSciNet Google Scholar
Santos, M.Y., Moreira, A.: Automatic classification of location contexts with decision trees. In: CSMU 2006, pp. 79–88 (2006)
Google Scholar
Shekhar, S., Chawla, S., Ravada, S., Fetterer, A., Liu, X., Lu, C.T.: Spatial databases: Accomplishments and research needs. IEEE Transactions on Knowledge and Data Engineering 11, 45–55 (1997)
Article Google Scholar
Silva, Y.N., Aref, E.: Similarity group-by. In: ICDE 2009, pp. 904–915 (2009)
Google Scholar
Yan, W., Larson, P.: Interchanging the order of grouping and join. In: Technical report (1995)
Google Scholar
Yan, W.P., Larson, P.A.: Eager aggregation and lazy aggregation. In: VLDB 1995, pp. 345–357 (1995)
Google Scholar
Zhang, C., Huang, Y.: Cluster by: a new sql extension for spatial data aggregation. In: ACM GIS 2007 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Software, Chinese Academy of Sciences, Beijing, China
Peng Sun
University of North Texas, Denton, TX, U.S.A
Yan Huang
Teradata Inc., El Segundo, CA, U.S.A
Chengyang Zhang

Authors

Peng Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Chengyang Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

College of Computer Science, Zhejiang University, Hangzhou, China
Yunjun Gao
Seoul National University, Seoul, Korea
Kyuseok Shim
Institute of Software, Chinese Academy of Sciences, South-Fourth-Street 4, Zhong-Guan-Cun, 100190, Beijing, P.R. China
Zhiming Ding
School of Computer Science and Technology, University of Science and Technology of China, 230027, Hefei, China
Peiquan Jin
School of Computer Science and Technology, Hangzhou Dianzi University, 310018, Hangzhou, China
Zujie Ren
Key Laboratory of Intelligence Computing and Novel Software Technology, Tianjin Key Laboratory of Computer Vision and System, Ministry of Education, Tianjin University of Technology, 300384, Tianjin, China
Yingyuan Xiao
CityU-USTC Advanced Research Institute, Suzhou, China
An Liu
School of Information Science and Technology, Southwest Jiaotong University, 610031, Chengdu, China
Shaojie Qiao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, P., Huang, Y., Zhang, C. (2013). Cluster-By: An Efficient Clustering Operator in Emergency Management Database Systems. In: Gao, Y., et al. Web-Age Information Management. WAIM 2013. Lecture Notes in Computer Science, vol 7901. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39527-7_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-39527-7_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39526-0
Online ISBN: 978-3-642-39527-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics