Abstract
Skyline query is an effective method to process large-sized multi-dimensional data sets as it can pinpoint the target data so that dominated data (say, 95% of data) can be efficiently excluded as unnecessary data objects. However, most of the conventional skyline algorithms were developed to handle numerical data. Thus, most of the text data were excluded from being processed by the algorithms. In this paper, we pioneer an entirely new domain for skyline query—namely, the categorical data—with which the corresponding ranking measures for the skyline queries are developed. We tested our proposed algorithm using the ACM Computing Classification System.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Kossmann, D., Ramsak, F., Rost, S.: Shooting stars in the sky: an online algorithm for skyline queries. In: VLDB 2002, 275–286 (2002)
Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: IEEE ICDE, pp. 421–430 (2001)
Tan, K., Eng, P., Ooi, B.C.: Efficient progressive skyline computation. In: VLDB 2001, pp. 301–310 (2001)
Chakrabarti, K., Chaudhuri, S., Hwang, S.: Automatic categorization of query results. In: ACM SIGMOD 2004, pp. 755–766 (2004)
Sarkas, N., Das, G., Koudas, N., Tung, A.K.H.: Categorical skylines for streaming data. In: ACM SIGMOD 2008, pp. 239–250 (2008)
Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: ACM SIGMOD 2003, 467–478 (2003)
Cohen, S., Shiloach, M.: Flexible XML querying using skyline semantics. In: IEEE ICDE 2009, pp. 553–564 (2009)
Tao, Y., Ding, L., Lin, X., Pei, J.: Distance-based representative skyline. In: IEEE ICDE 2009, pp. 892–903 (2009)
Zhang, S., Mamoulis, N., Cheung, D.W.: Scalable skyline computation using object-based space partitioning. In: ACM SIGMOD 2009, 483–494 (2009)
Atallah, M.J., Qi, Y.: Computing all skyline probabilities for uncertain data. In: PODS 2009, pp. 279–287 (2009)
Dimitris, S., Stavros, P., Dimitris, P.: Topologically sorted skylines for partially ordered domains. In: IEEE ICDE 2009, pp. 1072–1083 (2009)
Shin, M., Huh, S., Park, D., Lee, W.: Relaxing queries with hierarchical quantified data abstraction. J. Database Management 19(4), 47–61 (2008)
The 1998 ACM Computing Classification System (1998), http://www.acm.org/about/class/1998
Chan, C., Eng, P., Tan, K.: Stratified computation of skylines with partially-ordered domains. In: ACM SIGMOD 2005, 203–214 (2005)
Boriah, S., Chandola, V., Kumar, V.: Similarity measures for categorical data: a comparative evaluation. In: SIAM SDM 2008, pp. 243–254 (2008)
Burnaby, T.: On a method for character weighting a similarity coefficient, employing the concept of information. Mathematical Geology 2(1), 25–38 (1970)
Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S.: A geometric framework for unsupervised anomaly detection. In: Applications of Data Mining in Computer Security, pp. 78–100. Springer, Heidelberg (2002)
Goodall, D.W.: A new similarity index based on probability. Biometrics 22(4), 882–907 (1966)
Lin, D.: An information-theoretic definition of similarity. In: ICML, pp. 296–304 (1998)
Hwang, S., Yu, H.: Mining and processing category ranking. In: ACM SAC 2007, pp. 441–442 (2007)
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: ACL 1994, pp. 133–138 (1994)
ACM Digital Library, portal.acm.org
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, W., Song, J.J., Leung, C.K.S. (2011). Categorical Data Skyline Using Classification Tree. In: Du, X., Fan, W., Wang, J., Peng, Z., Sharaf, M.A. (eds) Web Technologies and Applications. APWeb 2011. Lecture Notes in Computer Science, vol 6612. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20291-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-20291-9_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20290-2
Online ISBN: 978-3-642-20291-9
eBook Packages: Computer ScienceComputer Science (R0)