Abstract
The increasing popularity of location-based social networks encourages more and more users to share their experiences. It deeply impacts the decision of customers when shopping, traveling, and so on. This paper studies the problem of top-K valuable documents query over geo-textual data stream. Many researchers have studied this problem. However, they do not consider the reliability of documents, where some unreliable documents may mislead customers to make improper decisions. In addition, they lack the ability to prune documents with low representativeness. In order to increase user satisfaction in recommendation systems, we propose a novel framework named PDS. It first employs an efficiently machine learning technique named ELM to prune unreliable documents, and then uses a novel index named \(\mathcal {GH}\) to maintain documents. For one thing, this index maintains a group of pruning values to filter low quality documents. For another, it utilizes the unique property of sliding window to further enhance the PDS performance. Theoretical analysis and extensive experimental results demonstrate the effectiveness of the proposed algorithms.






Similar content being viewed by others
Notes
In this paper, we use a the tuple 〈N,s〉 to express a sliding window, where N is the window length, s is the objects amount that are arriving at the window at the same moment.
References
Bai, M., Xin, J., Wang, G., Zhang, L., Zimmermann, R., Ye, Y., Wu, X.: Discovering the k representative skyline over a sliding window. IEEE Trans. Knowl Data Eng. 28(8), 2041–2056 (2016)
Caruana, G., Li, M., Qi, M.: A MapReduce based parallel SVM for large scale spam filtering. In: Fuzzy Systems and Knowledge Discovery (2011)
Chen, X., Zeng, Y., Cong, G., Qin, S., Xiang, Y., Dai, Y.: On information coverage for location category based point-of-interest recommendation. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Texas, USA, pp. 37–43 (2015)
Chen, L., Cong, G.: Diversity-aware top-k publish/subscribe for text stream. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Victoria, Australia, pp. 347–362 (2015)
Cheng, Y., Ye, Y., Chen, L., Wang, G., Giraud-Carrier, C.G., Sun, Y.: Distr: A distributed method for the reachability query over large uncertain graphs. IEEE Trans. Parallel Distrib. Syst. 27(11), 3172–3185 (2016)
Di, Y., Shastri, A., Rundensteiner, E.A., Ward, M.O.: An optimal strategy for monitoring top-k queries in streaming windows. In: EDBT, pp. 57–68 (2011)
Hu, H., Liu, Y., Li, G., Feng, J., Tan, K.-L.: A location-aware publish/subscribe framework for parameterized spatio-textual subscriptions. In: 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, pp. 711–722 (2015)
Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: International Symposium on Neural Networks, vol. 2 (2004)
Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning Machine: Theory and applications. Neurocomputing 70, 489–501 (2006)
Huang, G.-B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. 42, 513–529 (2012)
Mouratidis, K., Bakiras, S., Papadias, D.: Continuous monitoring of top-k queries over sliding windows. In: SIGMOD Conference, pp. 635–646 (2006)
Rong, H.-J., Huang, G.-B., Sundararajan, N., Saratchandran, P.: Online sequential fuzzy extreme learning machine for function approximation and classification problems. IEEE Trans. Syst. Man Cybern. 39, 1067–1072 (2009)
She, J., Tong, Y., Chen, L., Cao, C.C.: Conflict-aware event-participant arrangement and its variant for online setting. IEEE Trans. Knowl. Data Eng. 28(9), 2281–2295 (2016)
Shen, Z., Cheema, M.A., Lin, X., Zhang, W., Wang, H.: Efficiently monitoring top-k pairs over sliding windows. In: ICDE, pp. 798–809 (2012)
Tong, Y., Zhang, X., Chen, L.ei: Tracking frequent items over distributed probabilistic data. World Wide Web 19(4), 579–604 (2016)
Tong, Y., She, J., Meng, R.: Bottleneck-aware arrangement over event-based social networks: the max-min approach. World Wide Web 19(6), 1151–1177 (2016)
Tong, Y., She, J., Ding, B., Chen, L., Wo, T., Xu, K.: Online minimum matching in real-time spatial data Experiments and analysis. PVLDB 9(12), 1053–1064 (2016)
Tong, Y., She, J., Ding, B., Wang, L., Chen, L.: Online mobile micro-task allocation in spatial crowdsourcing. In: 32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, pp. 49–60 (2016)
Wang, X., Zhang, Y., Zhang, W., Lin, X., Wang, W.: Selectivity estimation on streaming spatio-textual data using local correlations. PVLDB 8(2), 101–112 (2014)
Ye, M., Yin, P., Lee, W.-C.: Location recommendation for location-based social networks 18th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, ACM-GIS 2010, pp 458–461. Proceedings, CA, USA (2010)
Zhu, R., Wang, B., Yang, X., Zheng, B., Wang, G.: SAP: Improving continuous top-k queries over streaming data. IEEE Trans. Knowl. Data Eng. 29(6), 1310–1328 (2017)
Acknowledgments
This work is partially supported by the NSF of China for Outstanding Young Scholars under grant No. 61322208, the NSF of China under grant Nos. 61572122, 61272178, 61502317, U1401256, and the NSF of China for Key Program under grant No. 61532021. Bin Wang is the corresponding author.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, B., Zhu, R., Yang, X. et al. Top-K representative documents query over geo-textual data stream. World Wide Web 21, 537–555 (2018). https://doi.org/10.1007/s11280-017-0470-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-017-0470-0