Skip to main content
Log in

BJR-tree: fast skyline computation algorithm using dominance relation-based tree structure

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

High-throughput label-free single-cell screening technology has been studied for the noninvasive analysis of various kinds of cells. Selecting the prominent cells with extreme features from a large number of cells is an important and interesting problem, which we call the serendipitous searching problem (SSP). In the SSP, it is important to find entries located near the rind of the population in a multi-dimensional feature space. We tackle the SSP as a continuous skyline computation. Originally, the skyline computation was designed to extract interesting entries from a database with multi-attributes. The skyline points are continuously updated as the existing entries disappear and new entries arrive. In this paper, we propose a balanced jointed rooted tree (BJR-tree) algorithm and a non-dominated relation cache (ND-cache) for continuous skyline computation. The BJR-tree expresses the dominance relation as an arc and stores the “dominated” relations. The ND-cache complements the BJR-tree by reducing the recalculation of the dominance relations. The execution times of the BJR-tree and existing continuous skyline computation algorithms are compared on randomly constructed synthetic datasets with multiple temporal and spatial features. The BJR-tree is then evaluated on actually measured information of blood cells. On the two- and eight-dimensional synthetic datasets, the BJR-tree computed the continuous skylines approximately 3 and 70 times faster than LookOut, respectively. On real-world datasets, BJR-tree was approximately 2.4–3.2 times faster than LookOut.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

Similar content being viewed by others

References

  1. Bartolini, I., Ciaccia, P., Patella, M.: Efficient sort-based skyline evaluation. ACM Trans. Database Syst. 33(4), 31:1–31:49 (2008)

    Article  Google Scholar 

  2. Bayer, R., McCreight, E.M.: Organization and maintenance of large ordered indexes. Acta Inf. 1(3), 173–189 (1972)

    Article  MATH  Google Scholar 

  3. BD Biosciences: Cell Sorters. http://www.bdbiosciences.com/us/instruments/research/cell-sorters/c/744762

  4. Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, SIGMOD ’90, pp. 322–331. ACM, New York, NY, USA (1990)

  5. Berchtold, S., Keim, D.A., Kriegel, H.P.: The X-tree: an index structure for high-dimensional data. In: Proceedings of the 22th International Conference on Very Large Data Bases, VLDB ’96, pp. 28–39. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1996)

  6. Bøgh, K.S., Assent, I., Magnani, M.: Efficient GPU-based skyline computation. In: Proceedings of the Ninth International Workshop on Data Management on New Hardware, DaMoN ’13, pp. 5:1–5:6. ACM, New York, NY, USA (2013)

  7. Bøgh, K.S., Chester, S., Assent, I.: Work-efficient parallel skyline computation for the GPU. Proc. VLDB Endow. 8(9), 962–973 (2015)

    Article  Google Scholar 

  8. Böhm, C., Kriegel, H.P.: Determining the convex hull in large multidimensional databases. In: Data Warehousing and Knowledge Discovery, pp. 294–306. Springer, Berlin (2001)

  9. Börzsönyi, S., Kossmann, D., Stocker, K.: The Skyline Operator. In: Proceedings 17th International Conference on Data Engineering, pp. 421–430 (2001)

  10. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD ’00, pp. 93–104. ACM, New York, NY, USA (2000)

  11. Buchta, C.: On the average number of maxima in a set of vectors. Inf. Process. Lett. 33(2), 63–65 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  12. Carpenter, A.E., Jones, T.R., Lamprecht, M.R., Clarke, C., Kang, I.H., Friman, O., Guertin, D.A., Chang, J.H., Lindquist, R.A., Moffat, J., Golland, P., Sabatini, D.M.: Cell Profiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biol. 7(10), R100 (2006)

    Article  Google Scholar 

  13. Chan, C.Y., Jagadish, H., Tan, K.L., Tung, A.K., Zhang, Z.: Finding k-dominant skylines in high dimensional space. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 503–514. ACM (2006)

  14. Chan, C.Y., Jagadish, H.V., Tan, K.L., Tung, A.K.H., Zhang, Z.: Finding K-dominant skylines in high dimensional space. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, SIGMOD ’06, pp. 503–514. ACM, New York, NY, USA (2006)

  15. Choi, W., Liu, L., Yu, B.: Multi-criteria decision making with skyline computation. In: 2012 IEEE 13th International Conference on Information Reuse and Integration (IRI), pp. 316–323. IEEE (2012)

  16. Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: Proceedings of 19th International Conference on Data Engineering, pp. 717–719. IEEE (2003)

  17. CYTO: CYTO2017 Image Analysis Challenge. http://cytoconference.org/2017/Home.aspx (2017)

  18. Finkel, R.A., Bentley, J.L.: Quad trees a data structure for retrieval on composite keys. Acta Inf. 4(1), 1–9 (1974)

    Article  MATH  Google Scholar 

  19. Fotiadou, K., Pitoura, E.: BITPEER: continuous subspace skyline computation with distributed bitmap indexes. In: Proceedings of the 2008 International Workshop on Data Management in Peer-to-Peer Systems, pp. 35–42. ACM (2008)

  20. Godfrey, P., Shipley, R., Gryz, J.: Algorithms and analyses for maximal vector computation. VLDB J. Int. J. Very Large Data Bases 16(1), 5–28 (2007)

    Article  Google Scholar 

  21. Graham, R.L.: An efficient algorith for determining the convex hull of a finite planar set. Inf. Process. Lett. 1(4), 132–133 (1972)

    Article  MATH  Google Scholar 

  22. Guo, B., Lei, C., Kobayashi, H., Ito, T., Yalikun, Y., Jiang, Y., Tanaka, Y., Ozeki, Y., Goda, K.: High-throughput, label-free, single-cell, microalgal lipid screening by machine-learning-equipped optofluidic time-stretch quantitative phase microscopy. Cytom. A 91(5), 494–502 (2017)

    Article  Google Scholar 

  23. Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, SIGMOD ’84, pp. 47–57. ACM, New York, NY, USA (1984)

  24. Hiraki, K., Inaba, M., Tezuka, H., Tomari, H., Koizumi, K., Kondo, S.: All-IP-ethernet architecture for real-time sensor-fusion processing. In: Proceedings of the SPIE, High-Speed Biomedical Imaging and Spectroscopy: Toward Big Data Instrumentation and Management, vol. 9720, p. 97200D (2016)

  25. Huang, Z., Lu, H., Ooi, B.C., Tung, A.K.H.: Continuous skyline queries for moving objects. IEEE Trans. Knowl. Data Eng. 18(12), 1645–1658 (2006)

    Article  Google Scholar 

  26. Jiang, Y., Lei, C., Yasumoto, A., Kobayashi, H., Aisaka, Y., Ito, T., Guo, B., Nitta, N., Kutsuna, N., Ozeki, Y., et al.: Label-free detection of aggregated platelets in blood by machine-learning-aided optofluidic time-stretch microscopy. Lab Chip 17(14), 2426–2434 (2017)

    Article  Google Scholar 

  27. Katayama, N., Satoh, S.: The SR-tree: an index structure for high-dimensional nearest neighbor queries. ACM SIGMOD Rec. 26(2), 369–380 (1997)

    Article  Google Scholar 

  28. Kim, Y.J., Patel, J.M.: Rethinking choices for multi-dimensional point indexing: making the case for the often ignored quadtree. In: CIDR, pp. 281–291 (2007)

  29. Koizumi, K., Eades, P., Hiraki, K., Inaba, M.: BJR-tree: fast skyline computation algorithm for serendipitous searching problems. In: 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA) (2017)

  30. Koizumi, K., Inaba, M., Hiraki, K.: Efficient implementation of continuous skyline computation on a multi-core processor. In: 2015 ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE), pp. 52–55 (2015)

  31. Kossmann, D., Ramsak, F., Rost, S.: Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 275–286. VLDB Endowment (2002)

  32. Kothuri, R.K.V., Ravada, S., Abugov, D.: Quadtree and R-tree indexes in oracle spatial: a comparison using GIS data. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 546–557. ACM (2002)

  33. Kriegel, H.P., S hubert, M., Zimek, A.: Angle-based Outlier Detection in High-dimensional Data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’08, pp. 444–452. ACM, New York, NY, USA (2008)

  34. Kung, H.T., Luccio, F., Preparata, F.P.: On finding the maxima of a set of vectors. JACM 22(4), 469–476 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  35. Lee, J., Hwang, S.W.: BSkyTree: scalable skyline computation using a balanced pivot selection. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 195–206. ACM (2010)

  36. Lee, M.W., Hwang, S.w.: Continuous Skylining on Volatile Moving Data. In: Proceedings of the 2009 IEEE International Conference on Data Engineering, ICDE ’09, pp. 1568–1575. IEEE Computer Society, Washington, DC, USA (2009)

  37. Liknes, S., Vlachou, A., Doulkeridis, C., Nørvåg, K.: APSkyline: improved skyline computation for multicore architectures. In: Database Systems for Advanced Applications, pp. 312–326. Springer (2014)

  38. Lin, X., Yuan, Y., Wang, W., Lu, H.: Stabbing the sky: efficient skyline computation over sliding windows. In: Proceedings of the 21st International Conference on Data Engineering, ICDE ’05, pp. 502–513. IEEE Computer Society, Washington, DC, USA (2005)

  39. Milder, P.: MEMOCODE 2015 design contest: continuous skyline computation. In: 2015 ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE), pp. 48–51. IEEE (2015)

  40. Morse, M., Patel, J.M., Grosky, W.I.: Efficient continuous skyline computation. In: 22nd International Conference on Data Engineering (ICDE’06), pp. 108–108 (2006)

  41. Oikawa, M., Hiyama, D., Hirayama, R., Hasegawa, S., Endo, Y., Sugie, T., Tsumura, N., Kuroshima, M., Maki, M., Okada, G., Lei, C., Ozeki, Y., Goda, K., Shimobaba, T.: A computational approach to real-time image processing for serial time-encoded amplified microscopy. In: Proceedings of the SPIE, High-Speed Biomedical Imaging and Spectroscopy: Toward Big Data Instrumentation and Management, vol. 9720, p. 97200E (2016)

  42. Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 467–478. ACM (2003)

  43. Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. ACM Trans. Database Syst. 30(1), 41–82 (2005)

    Article  Google Scholar 

  44. Raj, P., Raman, A., Nagaraj, D., Duggirala, S.: High-Performance Big-Data Analytics: Computing Systems and Approaches, 1st edn. Springer, Berlin (2015)

    Book  Google Scholar 

  45. Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. ACM Sigmod Rec. 24(2), 71–79 (1995)

    Article  Google Scholar 

  46. Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)

    Article  MATH  Google Scholar 

  47. Selke, J., Lofi, C., Balke, W.-T.: Highly scalable multiprocessing algorithms for preference-based database retrieval. In: Database Systems for Advanced Applications, pp. 246–260. Springer, Berlin (2010)

  48. Shang, H., Kitsuregawa, M.: Skyline operator on anti-correlated distributions. Proc. VLDB Endow. 6(9), 649–660 (2013)

    Article  Google Scholar 

  49. Su, L., Zou, P., Jia, Y.: Adaptive Mining the Approximate Skyline Over Data Stream, pp. 742–745. Springer, Berlin (2007)

    Google Scholar 

  50. Tan, K.L., Eng, P.K., Ooi, B.C., et al.: Efficient progressive skyline computation. In: Proceedings of the 27th International Conference on Very Large Data Bases, vol. 1, pp. 301–310 (2001)

  51. Tao, Y., Papadias, D.: Maintaining sliding window skylines on data streams. IEEE Trans. Knowl. Data Eng. 18(3), 377–391 (2006)

    Article  Google Scholar 

  52. Tian, L., Wang, L., Zou, P., Jia, Y., Li, A.: Continuous monitoring of skyline query over highly dynamic moving objects. In: Proceedings of the 6th ACM International Workshop on Data Engineering for Wireless and Mobile Access, pp. 59–66. ACM (2007)

  53. White, D.A., Jain, R.: Similarity indexing with the SS-tree. In: Proceedings of the Twelfth International Conference on Data Engineering, ICDE ’96, pp. 516–523. IEEE Computer Society, Washington, DC, USA (1996)

  54. Woods, L., Alonso, G., Teubner, J.: Parallel computation of skyline queries. In: Proceedings of the 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM ’13, pp. 1–8. IEEE Computer Society, Washington, DC, USA (2013)

  55. Woods, L., Alonso, G., Teubner, J.: Parallelizing data processing on FPGAs with shifter lists. TRETS 8(2), 7:1–7:22 (2015)

    Article  Google Scholar 

  56. Zhang, S., Mamoulis, N., Cheung, D.W.: Scalable skyline computation using object-based space partitioning. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pp. 483–494. ACM (2009)

Download references

Acknowledgements

This work was partially funded by ImPACT Program of Council for Science, Technology and Innovation (Cabinet Office, Government of Japan). We would like to acknowledge Dr. Lei, Dr. Ozeki, Dr. Sugimura, and Dr. Goda for providing measurement results of blood cells. We thank H. Tezuka for constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kenichi Koizumi.

Additional information

This paper is an extension version of the DSAA’2017 Research Track paper titled “BJR-tree: Fast Skyline Computation Algorithm for Serendipitous Searching Problems”.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Koizumi, K., Eades, P., Hiraki, K. et al. BJR-tree: fast skyline computation algorithm using dominance relation-based tree structure. Int J Data Sci Anal 7, 17–34 (2019). https://doi.org/10.1007/s41060-018-0098-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-018-0098-x

Keywords

Navigation