Skip to main content
Log in

Top-k temporal keyword search over social media data

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Social media services have already become main sources for monitoring emerging topics and sensing real-life events. A social media platform manages social stream consisting of a huge volume of timestamped user generated data, including original data and repost data. However, previous research on keyword search over social media data mainly emphasizes on the recency of information. In this paper, we first propose a problem of top-k most significant temporal keyword query to enable more complex query analysis. It returns top-k most popular social items that contain the keywords in the given query time window. Then, we design a temporal inverted index with two-tiers posting list to index social time series and a segment store to compute the exact social significance of social items. Next, we implement a basic query algorithm based on our proposed index structure and give a detailed performance analysis on the query algorithm. From the analysis result, we further refine our query algorithm with a piecewise maximum approximation (PMA) sketch. Finally, extensive empirical studies on a real-life microblog dataset demonstrate the combination of two-tiers posting list and PMA sketch achieves remarkable performance improvement under different query settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12

Similar content being viewed by others

Notes

  1. http://weibo.com

References

  1. Anand, A., Bedathur, S.J., Berberich, K., Schenkel, R.: Efficient Temporal Keyword Search over Versioned Text. In: CIKM, pp. 699–708 (2010)

  2. Arge, L., Vitter, J.S.: Optimal external memory interval management. SIAM J. Comput. 32(6), 1488–1508 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  3. Berberich, K., Bedathur, S., Neumann, T., Weikum, G.: A time machine for text search. SIGIR, 519 (2007)

  4. Busch, M., Gade, K., Larson, B., Lok, P., Luckenbill, S., Lin, J.: Earlybird: Real-Time Search at Twitter. In: ICDE, pp. 1360–1369 (2012)

  5. Chakrabarti, K., Keogh, E.J., Mehrotra, S., Pazzani, M.J.: Locally adaptive dimensionality reduction for indexing large time series databases. ACM Trans. Database Syst. 27(2), 188–228 (2002)

    Article  Google Scholar 

  6. Chen, C., Li, F., Ooi, B.C., Wu, S.: Ti: an Efficient Indexing Mechanism for Real-Time Search on Tweets. In: SIGMOD Conference, pp. 649–660 (2011)

  7. Chen, Q., Chen, L., Lian, X., Liu, Y., Yu, J.X.: Indexable Pla for Efficient Similarity Search. In: VLDB, pp. 435–446 (2007)

  8. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  9. Fuchs, E., Gruber, T., Nitschke, J., Sick, B.: Online segmentation of time series based on polynomial least-squares approximations. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2232–2245 (2010)

    Article  Google Scholar 

  10. Gao, M., Jin, C., Qian, W., Gong, X.: Real-Time Search over a Microblogging System. In: CGC, pp. 352–359 (2012)

  11. He, J., Suel, T.: Faster temporal range queries over versioned text. SIGIR, 565 (2011)

  12. Hitt, M.A., Anderson, C.: The long tail: Why the future of business is selling less of more (2007)

  13. Huang, X., Cheng, H., Li, R.H., Qin, L., Yu, J.X.: Top-k structural diversity search in large networks. Proceedings of the VLDB Endowment 6(13), 1618–1629 (2013)

    Article  Google Scholar 

  14. Huo, W., Tsotras, V.J.: A Comparison of Top-K Temporal Keyword Querying over Versioned Text Collections. In: Database and Expert Systems Applications, pp. 360–374. Springer (2012)

  15. Jestes, J., Phillips, J.M., Li, F., Tang, M.: Ranking large temporal data. PVLDB 5(11), 1412–1423 (2012)

    Google Scholar 

  16. Keogh, E.J., Chakrabarti, K., Pazzani, M.J., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst. 3(3), 263–286 (2001)

    Article  MATH  Google Scholar 

  17. Keogh, E.J., Chu, S., Hart, D.M., Pazzani, M.J.: An Online Algorithm for Segmenting Time Series. In: ICDM, pp. 289–296 (2001)

  18. Lemire, D.: A Better Alternative to Piecewise Linear Time Series Segmentation. In: SDM, pp. 545–550 (2007)

  19. Li, F., Yi, K., Le, W.: Top-k queries on temporal data. VLDB J. 19(5), 715–733 (2010)

    Article  Google Scholar 

  20. Li, J., Liu, C., Liu, B., Mao, R., Wang, Y., Chen, S., Yang, J.J., Pan, H., Wang, Q.: Diversity-aware retrieval of medical records. Comput. Ind. 69, 81–91 (2015)

    Article  Google Scholar 

  21. Li, Y., Bao, Z., Li, G., Tan, K.L.: Real Time Personalized Search on Social Networks. In: ICDE, pp. 639–650. IEEE (2015)

  22. Ma, H., Qian, W., Xia, F., He, X., Xu, J., Zhou, A.: Towards modeling popularity of microblogs. Frontiers of Computer Science 7(2), 171–184 (2013)

    Article  MathSciNet  Google Scholar 

  23. O’Neil, P., Cheng, E., Gawlick, D., O’Neil, E.: The log-structured merge-tree (lsm-tree). Acta Informatica 33(4), 351–385 (1996)

    Article  MATH  Google Scholar 

  24. Teevan, J., Ramage, D., Morris, M.R.: #Twittersearch: a Comparison of Microblog Search and Web Search. In: WSDM, pp. 35–44 (2011)

  25. Tweet Usage Statistics. http://www.internetlivestats.com/twitter-statistics (2016)

  26. Wang, J., Huang, J.Z., Guo, J., Lan, Y.: Recommending high-utility search engine queries via a queryrecommending model. Neurocomputing 167, 195–208 (2015)

    Article  Google Scholar 

  27. Wu, L., Lin, W., Xiao, X., Xu, Y.: Lsii: an Indexing Structure for Exact Real-Time Search on Microblogs. In: ICDE, pp. 482–493 (2013)

  28. Xia, F., Yu, C., Qian, W., Zhou, A.: Top-K Temporal Keyword Query over Social Media Data. In: Asia-Pacific Web Conference, pp. 183–195. Springer (2016)

  29. Xu, Z., Zhang, R., Ramamohanarao, K., Parampalli, U.: An Adaptive Algorithm for Online Time Series Segmentation with Error Bound Guarantee. In: EDBT, pp. 192–203 (2012)

  30. Zhuang, Y.: Building a complete Tweet index. Tuesday, November 18, 2014. https://blog.twitter.com/2014/building-a-complete-tweet-index (2014). [Online; accessed 21-November-2014]

Download references

Acknowledgments

This work is partially supported by National High-tech R&D Program (863 Program) under grant number 2015AA015307, and National Science Foundation of China under grant numbers 61432006 and 61672232.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fan Xia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xia, F., Yu, C., Xu, L. et al. Top-k temporal keyword search over social media data. World Wide Web 20, 1049–1069 (2017). https://doi.org/10.1007/s11280-016-0430-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-016-0430-0

Keywords

Navigation