Skip to main content

Categorical Data Skyline Using Classification Tree

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6612))

Abstract

Skyline query is an effective method to process large-sized multi-dimensional data sets as it can pinpoint the target data so that dominated data (say, 95% of data) can be efficiently excluded as unnecessary data objects. However, most of the conventional skyline algorithms were developed to handle numerical data. Thus, most of the text data were excluded from being processed by the algorithms. In this paper, we pioneer an entirely new domain for skyline query—namely, the categorical data—with which the corresponding ranking measures for the skyline queries are developed. We tested our proposed algorithm using the ACM Computing Classification System.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kossmann, D., Ramsak, F., Rost, S.: Shooting stars in the sky: an online algorithm for skyline queries. In: VLDB 2002, 275–286 (2002)

    Google Scholar 

  2. Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: IEEE ICDE, pp. 421–430 (2001)

    Google Scholar 

  3. Tan, K., Eng, P., Ooi, B.C.: Efficient progressive skyline computation. In: VLDB 2001, pp. 301–310 (2001)

    Google Scholar 

  4. Chakrabarti, K., Chaudhuri, S., Hwang, S.: Automatic categorization of query results. In: ACM SIGMOD 2004, pp. 755–766 (2004)

    Google Scholar 

  5. Sarkas, N., Das, G., Koudas, N., Tung, A.K.H.: Categorical skylines for streaming data. In: ACM SIGMOD 2008, pp. 239–250 (2008)

    Google Scholar 

  6. Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: ACM SIGMOD 2003, 467–478 (2003)

    Google Scholar 

  7. Cohen, S., Shiloach, M.: Flexible XML querying using skyline semantics. In: IEEE ICDE 2009, pp. 553–564 (2009)

    Google Scholar 

  8. Tao, Y., Ding, L., Lin, X., Pei, J.: Distance-based representative skyline. In: IEEE ICDE 2009, pp. 892–903 (2009)

    Google Scholar 

  9. Zhang, S., Mamoulis, N., Cheung, D.W.: Scalable skyline computation using object-based space partitioning. In: ACM SIGMOD 2009, 483–494 (2009)

    Google Scholar 

  10. Atallah, M.J., Qi, Y.: Computing all skyline probabilities for uncertain data. In: PODS 2009, pp. 279–287 (2009)

    Google Scholar 

  11. Dimitris, S., Stavros, P., Dimitris, P.: Topologically sorted skylines for partially ordered domains. In: IEEE ICDE 2009, pp. 1072–1083 (2009)

    Google Scholar 

  12. Shin, M., Huh, S., Park, D., Lee, W.: Relaxing queries with hierarchical quantified data abstraction. J. Database Management 19(4), 47–61 (2008)

    Article  Google Scholar 

  13. The 1998 ACM Computing Classification System (1998), http://www.acm.org/about/class/1998

  14. Chan, C., Eng, P., Tan, K.: Stratified computation of skylines with partially-ordered domains. In: ACM SIGMOD 2005, 203–214 (2005)

    Google Scholar 

  15. Boriah, S., Chandola, V., Kumar, V.: Similarity measures for categorical data: a comparative evaluation. In: SIAM SDM 2008, pp. 243–254 (2008)

    Google Scholar 

  16. Burnaby, T.: On a method for character weighting a similarity coefficient, employing the concept of information. Mathematical Geology 2(1), 25–38 (1970)

    Article  Google Scholar 

  17. Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S.: A geometric framework for unsupervised anomaly detection. In: Applications of Data Mining in Computer Security, pp. 78–100. Springer, Heidelberg (2002)

    Google Scholar 

  18. Goodall, D.W.: A new similarity index based on probability. Biometrics 22(4), 882–907 (1966)

    Article  Google Scholar 

  19. Lin, D.: An information-theoretic definition of similarity. In: ICML, pp. 296–304 (1998)

    Google Scholar 

  20. Hwang, S., Yu, H.: Mining and processing category ranking. In: ACM SAC 2007, pp. 441–442 (2007)

    Google Scholar 

  21. Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: ACL 1994, pp. 133–138 (1994)

    Google Scholar 

  22. ACM Digital Library, portal.acm.org

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lee, W., Song, J.J., Leung, C.K.S. (2011). Categorical Data Skyline Using Classification Tree. In: Du, X., Fan, W., Wang, J., Peng, Z., Sharaf, M.A. (eds) Web Technologies and Applications. APWeb 2011. Lecture Notes in Computer Science, vol 6612. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20291-9_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20291-9_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20290-2

  • Online ISBN: 978-3-642-20291-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics