Abstract
We consider a problem to select representative distinctive objects in a numerical database, which is an important problem in an early stage of knowledge discovery process. Skyline query and its variants are functions to find such representative objects. Skyline query selects representative objects that are not dominated by any other object in the dataset. Though skyline query is useful function, it cannot control the size of selected objects. In order to solve the problem, “top-k dominating query” and “K-skyband queries” have been introduced. However, conventional algorithms for computing those functions are not well suited for parallel distributed environment. In this paper, we consider a method for computing both queries in a parallel distributed framework called MapReduce, which is a popular framework to handle “big data”.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Balke, W.-T., Güntzer, U., Zheng, J.X.: Efficient distributed skylining for web information systems. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 256–273. Springer, Heidelberg (2004)
Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A comparison of join algorithms for log processing in mapreduce. In: Proceedings of SIGMOD
Borzsonyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings of ICDE
Chan, C.Y., Jagadish, H.V., Tan, K.-L., Tung, A.K.H., Zhang, Z.: Finding k-dominant skyline in high dimensional space. In: Proceedings of ACM SIGMOD
Chan, C.-Y., Jagadish, H.V., Tan, K.-L., Tung, A.K.H., Zhang, Z.: On high dimensional skylines. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 478–495. Springer, Heidelberg (2006)
Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: Proceedings of ICDE
Dellis, E., Seeger, B.: Efficient computation of reverse skyline queries. In: Proceedings of VLDB
Gong, Z., Sun, G.-Z., Yuan, J., Zhong, Y.: Efficient top-k query algorithms using K-skyband partition. In: Mueller, P., Cao, J.-N., Wang, C.-L. (eds.) INFOSCALE 2009. LNICST, vol. 18, pp. 288–305. Springer, Heidelberg (2009)
Jiang, D., Tung, A.K.H., Chen, G.: Map-join-reduce: Toward scalable and efficient data analysis on large clusters. IEEE Transactions Knowledge Data Engineering, TKDE (2011)
Li, C., Ooi, B.C., Tung, A.K.H., Wang, S.: Dada: A data cube for dominant relationship analysis. In: Proceedings of SIGMOD
Lin, X., Yuan, Y., Wang, W., Lu, H.: Stabbing the sky: Efficient skyline computation over sliding windows. In: Proceedings of ICDE
Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: Proceedings of SIGMOD
Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. ACM Transactions on Database Systems (2005)
Park, Y., Min, J., Shim, K.: Parallel computation of skyline and reverse skyline queries using mapreduce. In: Proceedings of VLDB
Tan, K.-L., Eng, P.-K., Ooi, B.C.: Efficient progressive skyline computation. In: Proceedings of VLDB
Tao, Y., Lin, W., Xiao, X.: Minimal mapreduce algorithm. In: Proceedings of SIGMOD
Tao, Y., Xiao, X., Pei, J.: Subsky: Efficient computation of skylines in subspaces. In: Proceedings of ICDE
Vernica, R., Carey, M.J., Li, C.: Efficient parallel set-similarity joins using mapreduce. In: Proceedings of SIGMOD
Vlachou, A., Doulkeridis, C., Kotidis, Y., Vazirgiannis, M.: Skypeer: Efficient subspace skyline computation over distributed data. In: Proceedings of ICDE
Yiu, M.L., Mamoulis, N.: Efficient processing of top-k dominating queries on multi-dimensional data. In: Proceedings of VLDB
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Siddique, M.A., Tian, H., Morimoto, Y. (2014). Selecting Representative Objects from Large Database by Using K-Skyband and Top-k Dominating Queries in MapReduce Environment. In: Luo, X., Yu, J.X., Li, Z. (eds) Advanced Data Mining and Applications. ADMA 2014. Lecture Notes in Computer Science(), vol 8933. Springer, Cham. https://doi.org/10.1007/978-3-319-14717-8_44
Download citation
DOI: https://doi.org/10.1007/978-3-319-14717-8_44
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14716-1
Online ISBN: 978-3-319-14717-8
eBook Packages: Computer ScienceComputer Science (R0)