Skip to main content

A Real-Time Distributed Index Based on Topic for Microblogging System

  • Conference paper
  • First Online:
Internet Multimedia Computing and Service (ICIMCS 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 819))

Included in the following conference series:

  • 1363 Accesses

Abstract

With the development of internet technology and widely used in mobile devices, the microblogging systems such as Twitter and Sina Weibo in China have become the most important platform for people to retrieve information and communicate with each other. The real-time search became a big challenge for microblogging systems because of the volume of data and users. Existing approaches build all microblogs in an index which will increase the cost of index update and query. The search results could not satisfy users’ timely and high quality requirements. In this paper, we propose a new real-time distributed index based on topic (RDIBT), which can build index for each topic. Those topical indices will be distributed to many sites, so it can improve the concurrently of queries. Extensive experiments demonstrate the effectiveness and efficiency of RDIBT on the real dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 107.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abiteboul, S., Vianu, V.: Queries and computation on the Web. In: Afrati, F., Kolaitis, P. (eds.) ICDT 1997. LNCS, vol. 1186, pp. 262–275. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-62222-5_50. Author, F.: Article title. Journal 2(5), 99–110 (2016)

    Chapter  Google Scholar 

  2. Apache. Apache lucene (2012). http://lucene.apache.org/core/

  3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  4. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1), 107–117 (1998)

    Article  Google Scholar 

  5. Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, pp. 89–96. ACM (2005)

    Google Scholar 

  6. Busch, M., Gade, K., Larson, B., Lok, P., Luckenbill, S., Lin, J.: Earlybird: real-time search at Twitter. In: Proceedings of 28th International Conference on Data Engineering (ICDE), Washington, DC, USA, pp. 1360–1369. IEEE (2012)

    Google Scholar 

  7. Chen, C., Li, F., Ooi, B.C., Wu, S.: TI: an efficient indexing mechanism for real-time search on tweets. In: Proceedings of the 30th International Conference on Management of Data, Athens, Greece, pp. 649–660. ACM (2011)

    Google Scholar 

  8. Chu, W., Keerthi, S.S.: Support vector ordinal regression. Neural Comput. 19(3), 792–815 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  9. Diligenti, M., Coetzee, F., Lawrence, S., Giles, C.L., Gori, M., et al.: Focused crawling using context graphs. In: Proceedings of 26th International Conference on Very Large Data Bases, Cairo, Egypt, pp. 527–534. ACM (2000)

    Google Scholar 

  10. Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933–969 (2003)

    MathSciNet  MATH  Google Scholar 

  11. Gao, M., Jin, C., Qian, W., Gong, X.: Real-time search over a microblogging system. In: Proceedings of the 2nd International Conference on Cloud and Green Computing, Xiangtan, Hunan, China, pp. 352–359. IEEE (2012)

    Google Scholar 

  12. Gao, M., Jin, C., Qian, W., Gong, X.: Real-time and personalized search over a microblogging system. Comput. J. 57(9), 1281–1295 (2013)

    Article  Google Scholar 

  13. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Nat. Acad. Sci. U.S. Am. 101(Suppl 1), 5228–5235 (2004)

    Article  Google Scholar 

  14. Herbrich, R., Graepel, T., Obermayer, K.: Support vector learning for ordinal regression. In: Proceedings of 9th International Conference on Artificial Neural Networks, Edinburgh, Scotland, pp. 97–102. IEEE (1999)

    Google Scholar 

  15. Kleinberg, J., Tomkins, A.: Applications of linear algebra in information retrieval and hypertext analysis. In: Proceedings of the 18th SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Philadelphia, Pennsylvania, pp. 185–193. ACM (1999)

    Google Scholar 

  16. Lu, Y., Zhai, C.: Opinion integration through semi-supervised topic modeling. In: Proceedings of the 17th International Conference on World Wide Web, Beijing, China, pp. 121–130. ACM (2008)

    Google Scholar 

  17. Minka, T., Lafferty, J.: Expectation-propagation for the generative aspect model. In: Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, Edmonton, Alberta, Canada, pp. 352–359. Morgan Kaufmann Publishers Inc. (2002)

    Google Scholar 

  18. Pant, G., Srinivasan, P., Menczer, F.: Crawling the web. In: Pant, G., Srinivasan, P., Menczer, F. (eds.) Web Dynamics, pp. 153–177. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-662-10874-1_7

    Chapter  Google Scholar 

  19. Teevan, J., Ramage, D., Morris, M.R.: Twittersearch: a comparison of microblog search and web search. In: Proceedings of the 4th International Conference on Web Search and Data Mining, HongKong, China, pp. 35–44. ACM (2011)

    Google Scholar 

  20. Wu, L., Lin, W., Xiao, X., Xu, Y.: LSII: an indexing structure for exact real-time search on microblogs. In: Proceedings of the 29th International Conference on Data Engineering, Brisbane, Australia, pp. 482–493. IEEE (2013)

    Google Scholar 

  21. Zhao, W.X., Jiang, J., Weng, J., He, J., Lim, E.-P., Yan, H., Li, X.: Comparing Twitter and traditional media using topic models. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 338–349. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20161-5_34

    Chapter  Google Scholar 

Download references

Acknowledgment

Thanks to the anonymous reviewers for their insightful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhikun Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, Z., Wang, L., Yang, S. (2018). A Real-Time Distributed Index Based on Topic for Microblogging System. In: Huet, B., Nie, L., Hong, R. (eds) Internet Multimedia Computing and Service. ICIMCS 2017. Communications in Computer and Information Science, vol 819. Springer, Singapore. https://doi.org/10.1007/978-981-10-8530-7_22

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-8530-7_22

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-8529-1

  • Online ISBN: 978-981-10-8530-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics