Categorical Data Skyline Using Classification Tree

Lee, Wookey; Song, Justin JongSu; Leung, Carson K. -S.

doi:10.1007/978-3-642-20291-9_19

Categorical Data Skyline Using Classification Tree

Wookey Lee²¹,
Justin JongSu Song²¹ &
Carson K. -S. Leung²²

Conference paper

1093 Accesses
14 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6612))

Abstract

Skyline query is an effective method to process large-sized multi-dimensional data sets as it can pinpoint the target data so that dominated data (say, 95% of data) can be efficiently excluded as unnecessary data objects. However, most of the conventional skyline algorithms were developed to handle numerical data. Thus, most of the text data were excluded from being processed by the algorithms. In this paper, we pioneer an entirely new domain for skyline query—namely, the categorical data—with which the corresponding ranking measures for the skyline queries are developed. We tested our proposed algorithm using the ACM Computing Classification System.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kossmann, D., Ramsak, F., Rost, S.: Shooting stars in the sky: an online algorithm for skyline queries. In: VLDB 2002, 275–286 (2002)
Google Scholar
Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: IEEE ICDE, pp. 421–430 (2001)
Google Scholar
Tan, K., Eng, P., Ooi, B.C.: Efficient progressive skyline computation. In: VLDB 2001, pp. 301–310 (2001)
Google Scholar
Chakrabarti, K., Chaudhuri, S., Hwang, S.: Automatic categorization of query results. In: ACM SIGMOD 2004, pp. 755–766 (2004)
Google Scholar
Sarkas, N., Das, G., Koudas, N., Tung, A.K.H.: Categorical skylines for streaming data. In: ACM SIGMOD 2008, pp. 239–250 (2008)
Google Scholar
Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: ACM SIGMOD 2003, 467–478 (2003)
Google Scholar
Cohen, S., Shiloach, M.: Flexible XML querying using skyline semantics. In: IEEE ICDE 2009, pp. 553–564 (2009)
Google Scholar
Tao, Y., Ding, L., Lin, X., Pei, J.: Distance-based representative skyline. In: IEEE ICDE 2009, pp. 892–903 (2009)
Google Scholar
Zhang, S., Mamoulis, N., Cheung, D.W.: Scalable skyline computation using object-based space partitioning. In: ACM SIGMOD 2009, 483–494 (2009)
Google Scholar
Atallah, M.J., Qi, Y.: Computing all skyline probabilities for uncertain data. In: PODS 2009, pp. 279–287 (2009)
Google Scholar
Dimitris, S., Stavros, P., Dimitris, P.: Topologically sorted skylines for partially ordered domains. In: IEEE ICDE 2009, pp. 1072–1083 (2009)
Google Scholar
Shin, M., Huh, S., Park, D., Lee, W.: Relaxing queries with hierarchical quantified data abstraction. J. Database Management 19(4), 47–61 (2008)
Article Google Scholar
The 1998 ACM Computing Classification System (1998), http://www.acm.org/about/class/1998
Chan, C., Eng, P., Tan, K.: Stratified computation of skylines with partially-ordered domains. In: ACM SIGMOD 2005, 203–214 (2005)
Google Scholar
Boriah, S., Chandola, V., Kumar, V.: Similarity measures for categorical data: a comparative evaluation. In: SIAM SDM 2008, pp. 243–254 (2008)
Google Scholar
Burnaby, T.: On a method for character weighting a similarity coefficient, employing the concept of information. Mathematical Geology 2(1), 25–38 (1970)
Article Google Scholar
Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S.: A geometric framework for unsupervised anomaly detection. In: Applications of Data Mining in Computer Security, pp. 78–100. Springer, Heidelberg (2002)
Google Scholar
Goodall, D.W.: A new similarity index based on probability. Biometrics 22(4), 882–907 (1966)
Article Google Scholar
Lin, D.: An information-theoretic definition of similarity. In: ICML, pp. 296–304 (1998)
Google Scholar
Hwang, S., Yu, H.: Mining and processing category ranking. In: ACM SAC 2007, pp. 441–442 (2007)
Google Scholar
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: ACL 1994, pp. 133–138 (1994)
Google Scholar
ACM Digital Library, portal.acm.org

Download references

Author information

Authors and Affiliations

Department of Industrial Engineering, Inha University, South Korea
Wookey Lee & Justin JongSu Song
Department of Computer Science, The University of Manitoba, Canada
Carson K. -S. Leung

Authors

Wookey Lee
View author publications
You can also search for this author in PubMed Google Scholar
Justin JongSu Song
View author publications
You can also search for this author in PubMed Google Scholar
Carson K. -S. Leung
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information, Renmin University of China, 100872, Beijing, China
Xiaoyong Du
LFCS, School of Informatics, University of Edinburgh, 10 Crichton Street, EH8 9AB, Edinburgh, Scotland, UK
Wenfei Fan
School of Software, Tsinghua University, Room 819, Main Building, 100084, Beijing, China
Jianmin Wang
Computer School, Wuhan University, Luojiashan Road, 430072, Wuhan, Hubei, China
Zhiyong Peng
School of Information Technology and Electrical Engineering, The University of Queensland, QLD 4072, St. Lucia, Australia
Mohamed A. Sharaf

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, W., Song, J.J., Leung, C.K.S. (2011). Categorical Data Skyline Using Classification Tree. In: Du, X., Fan, W., Wang, J., Peng, Z., Sharaf, M.A. (eds) Web Technologies and Applications. APWeb 2011. Lecture Notes in Computer Science, vol 6612. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20291-9_19

Download citation

DOI: https://doi.org/10.1007/978-3-642-20291-9_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20290-2
Online ISBN: 978-3-642-20291-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics