Abstract
High-throughput label-free single-cell screening technology has been studied for the noninvasive analysis of various kinds of cells. Selecting the prominent cells with extreme features from a large number of cells is an important and interesting problem, which we call the serendipitous searching problem (SSP). In the SSP, it is important to find entries located near the rind of the population in a multi-dimensional feature space. We tackle the SSP as a continuous skyline computation. Originally, the skyline computation was designed to extract interesting entries from a database with multi-attributes. The skyline points are continuously updated as the existing entries disappear and new entries arrive. In this paper, we propose a balanced jointed rooted tree (BJR-tree) algorithm and a non-dominated relation cache (ND-cache) for continuous skyline computation. The BJR-tree expresses the dominance relation as an arc and stores the “dominated” relations. The ND-cache complements the BJR-tree by reducing the recalculation of the dominance relations. The execution times of the BJR-tree and existing continuous skyline computation algorithms are compared on randomly constructed synthetic datasets with multiple temporal and spatial features. The BJR-tree is then evaluated on actually measured information of blood cells. On the two- and eight-dimensional synthetic datasets, the BJR-tree computed the continuous skylines approximately 3 and 70 times faster than LookOut, respectively. On real-world datasets, BJR-tree was approximately 2.4–3.2 times faster than LookOut.
Similar content being viewed by others
References
Bartolini, I., Ciaccia, P., Patella, M.: Efficient sort-based skyline evaluation. ACM Trans. Database Syst. 33(4), 31:1–31:49 (2008)
Bayer, R., McCreight, E.M.: Organization and maintenance of large ordered indexes. Acta Inf. 1(3), 173–189 (1972)
BD Biosciences: Cell Sorters. http://www.bdbiosciences.com/us/instruments/research/cell-sorters/c/744762
Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, SIGMOD ’90, pp. 322–331. ACM, New York, NY, USA (1990)
Berchtold, S., Keim, D.A., Kriegel, H.P.: The X-tree: an index structure for high-dimensional data. In: Proceedings of the 22th International Conference on Very Large Data Bases, VLDB ’96, pp. 28–39. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1996)
Bøgh, K.S., Assent, I., Magnani, M.: Efficient GPU-based skyline computation. In: Proceedings of the Ninth International Workshop on Data Management on New Hardware, DaMoN ’13, pp. 5:1–5:6. ACM, New York, NY, USA (2013)
Bøgh, K.S., Chester, S., Assent, I.: Work-efficient parallel skyline computation for the GPU. Proc. VLDB Endow. 8(9), 962–973 (2015)
Böhm, C., Kriegel, H.P.: Determining the convex hull in large multidimensional databases. In: Data Warehousing and Knowledge Discovery, pp. 294–306. Springer, Berlin (2001)
Börzsönyi, S., Kossmann, D., Stocker, K.: The Skyline Operator. In: Proceedings 17th International Conference on Data Engineering, pp. 421–430 (2001)
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD ’00, pp. 93–104. ACM, New York, NY, USA (2000)
Buchta, C.: On the average number of maxima in a set of vectors. Inf. Process. Lett. 33(2), 63–65 (1989)
Carpenter, A.E., Jones, T.R., Lamprecht, M.R., Clarke, C., Kang, I.H., Friman, O., Guertin, D.A., Chang, J.H., Lindquist, R.A., Moffat, J., Golland, P., Sabatini, D.M.: Cell Profiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biol. 7(10), R100 (2006)
Chan, C.Y., Jagadish, H., Tan, K.L., Tung, A.K., Zhang, Z.: Finding k-dominant skylines in high dimensional space. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 503–514. ACM (2006)
Chan, C.Y., Jagadish, H.V., Tan, K.L., Tung, A.K.H., Zhang, Z.: Finding K-dominant skylines in high dimensional space. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, SIGMOD ’06, pp. 503–514. ACM, New York, NY, USA (2006)
Choi, W., Liu, L., Yu, B.: Multi-criteria decision making with skyline computation. In: 2012 IEEE 13th International Conference on Information Reuse and Integration (IRI), pp. 316–323. IEEE (2012)
Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: Proceedings of 19th International Conference on Data Engineering, pp. 717–719. IEEE (2003)
CYTO: CYTO2017 Image Analysis Challenge. http://cytoconference.org/2017/Home.aspx (2017)
Finkel, R.A., Bentley, J.L.: Quad trees a data structure for retrieval on composite keys. Acta Inf. 4(1), 1–9 (1974)
Fotiadou, K., Pitoura, E.: BITPEER: continuous subspace skyline computation with distributed bitmap indexes. In: Proceedings of the 2008 International Workshop on Data Management in Peer-to-Peer Systems, pp. 35–42. ACM (2008)
Godfrey, P., Shipley, R., Gryz, J.: Algorithms and analyses for maximal vector computation. VLDB J. Int. J. Very Large Data Bases 16(1), 5–28 (2007)
Graham, R.L.: An efficient algorith for determining the convex hull of a finite planar set. Inf. Process. Lett. 1(4), 132–133 (1972)
Guo, B., Lei, C., Kobayashi, H., Ito, T., Yalikun, Y., Jiang, Y., Tanaka, Y., Ozeki, Y., Goda, K.: High-throughput, label-free, single-cell, microalgal lipid screening by machine-learning-equipped optofluidic time-stretch quantitative phase microscopy. Cytom. A 91(5), 494–502 (2017)
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, SIGMOD ’84, pp. 47–57. ACM, New York, NY, USA (1984)
Hiraki, K., Inaba, M., Tezuka, H., Tomari, H., Koizumi, K., Kondo, S.: All-IP-ethernet architecture for real-time sensor-fusion processing. In: Proceedings of the SPIE, High-Speed Biomedical Imaging and Spectroscopy: Toward Big Data Instrumentation and Management, vol. 9720, p. 97200D (2016)
Huang, Z., Lu, H., Ooi, B.C., Tung, A.K.H.: Continuous skyline queries for moving objects. IEEE Trans. Knowl. Data Eng. 18(12), 1645–1658 (2006)
Jiang, Y., Lei, C., Yasumoto, A., Kobayashi, H., Aisaka, Y., Ito, T., Guo, B., Nitta, N., Kutsuna, N., Ozeki, Y., et al.: Label-free detection of aggregated platelets in blood by machine-learning-aided optofluidic time-stretch microscopy. Lab Chip 17(14), 2426–2434 (2017)
Katayama, N., Satoh, S.: The SR-tree: an index structure for high-dimensional nearest neighbor queries. ACM SIGMOD Rec. 26(2), 369–380 (1997)
Kim, Y.J., Patel, J.M.: Rethinking choices for multi-dimensional point indexing: making the case for the often ignored quadtree. In: CIDR, pp. 281–291 (2007)
Koizumi, K., Eades, P., Hiraki, K., Inaba, M.: BJR-tree: fast skyline computation algorithm for serendipitous searching problems. In: 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA) (2017)
Koizumi, K., Inaba, M., Hiraki, K.: Efficient implementation of continuous skyline computation on a multi-core processor. In: 2015 ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE), pp. 52–55 (2015)
Kossmann, D., Ramsak, F., Rost, S.: Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 275–286. VLDB Endowment (2002)
Kothuri, R.K.V., Ravada, S., Abugov, D.: Quadtree and R-tree indexes in oracle spatial: a comparison using GIS data. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 546–557. ACM (2002)
Kriegel, H.P., S hubert, M., Zimek, A.: Angle-based Outlier Detection in High-dimensional Data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’08, pp. 444–452. ACM, New York, NY, USA (2008)
Kung, H.T., Luccio, F., Preparata, F.P.: On finding the maxima of a set of vectors. JACM 22(4), 469–476 (1975)
Lee, J., Hwang, S.W.: BSkyTree: scalable skyline computation using a balanced pivot selection. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 195–206. ACM (2010)
Lee, M.W., Hwang, S.w.: Continuous Skylining on Volatile Moving Data. In: Proceedings of the 2009 IEEE International Conference on Data Engineering, ICDE ’09, pp. 1568–1575. IEEE Computer Society, Washington, DC, USA (2009)
Liknes, S., Vlachou, A., Doulkeridis, C., Nørvåg, K.: APSkyline: improved skyline computation for multicore architectures. In: Database Systems for Advanced Applications, pp. 312–326. Springer (2014)
Lin, X., Yuan, Y., Wang, W., Lu, H.: Stabbing the sky: efficient skyline computation over sliding windows. In: Proceedings of the 21st International Conference on Data Engineering, ICDE ’05, pp. 502–513. IEEE Computer Society, Washington, DC, USA (2005)
Milder, P.: MEMOCODE 2015 design contest: continuous skyline computation. In: 2015 ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE), pp. 48–51. IEEE (2015)
Morse, M., Patel, J.M., Grosky, W.I.: Efficient continuous skyline computation. In: 22nd International Conference on Data Engineering (ICDE’06), pp. 108–108 (2006)
Oikawa, M., Hiyama, D., Hirayama, R., Hasegawa, S., Endo, Y., Sugie, T., Tsumura, N., Kuroshima, M., Maki, M., Okada, G., Lei, C., Ozeki, Y., Goda, K., Shimobaba, T.: A computational approach to real-time image processing for serial time-encoded amplified microscopy. In: Proceedings of the SPIE, High-Speed Biomedical Imaging and Spectroscopy: Toward Big Data Instrumentation and Management, vol. 9720, p. 97200E (2016)
Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 467–478. ACM (2003)
Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. ACM Trans. Database Syst. 30(1), 41–82 (2005)
Raj, P., Raman, A., Nagaraj, D., Duggirala, S.: High-Performance Big-Data Analytics: Computing Systems and Approaches, 1st edn. Springer, Berlin (2015)
Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. ACM Sigmod Rec. 24(2), 71–79 (1995)
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Selke, J., Lofi, C., Balke, W.-T.: Highly scalable multiprocessing algorithms for preference-based database retrieval. In: Database Systems for Advanced Applications, pp. 246–260. Springer, Berlin (2010)
Shang, H., Kitsuregawa, M.: Skyline operator on anti-correlated distributions. Proc. VLDB Endow. 6(9), 649–660 (2013)
Su, L., Zou, P., Jia, Y.: Adaptive Mining the Approximate Skyline Over Data Stream, pp. 742–745. Springer, Berlin (2007)
Tan, K.L., Eng, P.K., Ooi, B.C., et al.: Efficient progressive skyline computation. In: Proceedings of the 27th International Conference on Very Large Data Bases, vol. 1, pp. 301–310 (2001)
Tao, Y., Papadias, D.: Maintaining sliding window skylines on data streams. IEEE Trans. Knowl. Data Eng. 18(3), 377–391 (2006)
Tian, L., Wang, L., Zou, P., Jia, Y., Li, A.: Continuous monitoring of skyline query over highly dynamic moving objects. In: Proceedings of the 6th ACM International Workshop on Data Engineering for Wireless and Mobile Access, pp. 59–66. ACM (2007)
White, D.A., Jain, R.: Similarity indexing with the SS-tree. In: Proceedings of the Twelfth International Conference on Data Engineering, ICDE ’96, pp. 516–523. IEEE Computer Society, Washington, DC, USA (1996)
Woods, L., Alonso, G., Teubner, J.: Parallel computation of skyline queries. In: Proceedings of the 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM ’13, pp. 1–8. IEEE Computer Society, Washington, DC, USA (2013)
Woods, L., Alonso, G., Teubner, J.: Parallelizing data processing on FPGAs with shifter lists. TRETS 8(2), 7:1–7:22 (2015)
Zhang, S., Mamoulis, N., Cheung, D.W.: Scalable skyline computation using object-based space partitioning. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pp. 483–494. ACM (2009)
Acknowledgements
This work was partially funded by ImPACT Program of Council for Science, Technology and Innovation (Cabinet Office, Government of Japan). We would like to acknowledge Dr. Lei, Dr. Ozeki, Dr. Sugimura, and Dr. Goda for providing measurement results of blood cells. We thank H. Tezuka for constructive comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is an extension version of the DSAA’2017 Research Track paper titled “BJR-tree: Fast Skyline Computation Algorithm for Serendipitous Searching Problems”.
Rights and permissions
About this article
Cite this article
Koizumi, K., Eades, P., Hiraki, K. et al. BJR-tree: fast skyline computation algorithm using dominance relation-based tree structure. Int J Data Sci Anal 7, 17–34 (2019). https://doi.org/10.1007/s41060-018-0098-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-018-0098-x