Skip to main content
Log in

Efficient multi-event monitoring using built-in search engines

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Users of the internet often wish to follow certain news events, and the interests of these users often overlap. General search engines (GSEs) cannot be used to achieve this task due to incomplete coverage and lack of freshness. Instead, a broker is used to regularly query the built-in search engines (BSEs) of news and social media sites. Each user defines an event profile consisting of a set of query rules called event rules (ERs). To ensure that queries match the semantics of BSEs, ERs are transformed into a disjunctive normal form, and separated into conjunctive clauses (atomic event rules, AERs). It is slow to process all AERs on BSEs, and can violate query submission rate limits. Accordingly, the set of AERs is reduced to eliminate AERs that are duplicates, or logically contained by other AERs. Five types of event are selected for experimental comparison and analysis, including natural disasters, accident disasters, public health events, social security events, and negative events of public servants. Using 12 BSEs, 85 ERs for five types of events are defined by five users. Experimental comparison is conducted on three aspects: event rule reduction ratio, number of collected events, and that of related events. Experimental results in this paper show that event rule reduction effectively enhances the efficiency of crawling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Lawrence S, Giles CL. Accessibility of information on the Web. Nature, 1999, 107-09

    Google Scholar 

  2. Selberg E, Etzioni O. The MetaCrawler architecture for resource aggregation on the Web. IEEE Expert, 1997, 12 (1): 11–14

    Google Scholar 

  3. Fellbaun C, Miller G A. Word Net: A lexical database for the English language [EB/OL]. 2006

    Google Scholar 

  4. Li W J, Mu M L, Lu Q, Wei X, Yuan C F. Extractive summarization using inter-and intra-event relevance. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL. 2006, 369–376

    Google Scholar 

  5. Filatova E, Hatzivassiloglou V. Domain-independent detection, extraction, and labeling of atomic events. In: Proceedings of the 2003 Recent Advances in Natural Language Processing. 2003, 145–152

    Google Scholar 

  6. Zhong Z M, Liu Z T, Li C H, Guan Y. Event ontology reasoning based on event class influence factors. International Journal ofMachine Learning and Cybernetics, 2012, 3 (2): 133–139

    Article  Google Scholar 

  7. Demers A J, Gehrke J, Panda B, Riedewald M, Sharma V, White W. Cayuga: A general purpose event monitoring system. In: Proceeding of Biennial Conference on Innovative Data Systems Research. 2007, 412–422

    Google Scholar 

  8. Li C H, Hu Y, Zhong Z M. An event ontology construction approach to web crime mining. In: Proceedings of the 7th International Conference on Fuzzy Systems and Knowledge Discovery. 2010, 2441–2445

    Google Scholar 

  9. Albakour M D, Macdonald C, Ounis L. Identifying local events by using microblogs as social sensors. In: Proceedings of the 10th International Conference on Open Research Areas in Information Retrieval. 2013, 173–180

    Google Scholar 

  10. Lee S J, Lee S, Kim K, Park J. Bursty event detection from text streams for disaster management. In: Proceedings of the International Conference Companion on World Wide Web. 2012, 679–681

    Google Scholar 

  11. Zhao W X, Chen R H, Fan K, Yan H F, Li X M. A novel burst-based text representation model for scalable event detection. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 2012, 43–47

    Google Scholar 

  12. Zhang L M, Jia Y, Zhou B, Zhao J H, Hong F. Online bursty events detection based on emotions. Chinese Journal of Computers, 2013, 1659-667

    Google Scholar 

  13. Chakrabarti S, Den Berg M V, Dom B. Focused crawling: A new approach to topic-specific web resource discovery. Computer Networks, 1999, 1623–6640

    Google Scholar 

  14. Medelyan O, Schulz S, Paetzold J, Poprat M, Markó K. Language specific and topic focused web crawling. In: Proceedings of the Language Resources Conference LREC. 2006, 267–269

    Google Scholar 

  15. Sotiris B, Euripides G M, Petrakis E M. Improving the performance of focused web crawlers. Data & Knowledge Engineering, 2009, 68 (10): 1001–1013

    Article  Google Scholar 

  16. Lee Y H, Na S H, Lee J H. Utilizing local evidence for blog feed search. Information Retrieval, 2012, 15 (2): 157–177

    Article  MathSciNet  Google Scholar 

  17. Du Y J, Pen Q Q, Gao Z Q. A topic-specific crawling strategy based on semantics similarity. Data & Knowledge Engineering, 2013, 88: 75–93

    Article  Google Scholar 

  18. Jiang J T, Song X Y, Yu N H, Lin C Y. Focus: Learning to crawl web forums. IEEE Transactions on Knowledge and Data Engineering, 2013, 1293–1306

    Google Scholar 

  19. Liu L, Peng T. Clustering-based topical web crawling using CFu-tree guided by link-context. Frontiers of Computer Science, 2014, 8 (4): 581–595

    Article  MathSciNet  Google Scholar 

  20. Metzler D, Cai C X, Hovy E. Structured event retrieval over microblog archives. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2012, 646–655

    Google Scholar 

  21. Steven S, Martine D C, Etienne E K. Reasoning about fuzzy temporal information from the web: towards retrieval of historical events. Soft Computing, 2010, 14 (8): 869–886

    Article  Google Scholar 

  22. Zhong Z M, Zhu P, Li C H, Guan Y, Liu Z T. Research on eventoriented query expansion based on local analysis. Journal of the China Society for Scientific and Technical Information, 2012, 31 (2): 151–159

    Google Scholar 

  23. Zhong ZM, Li C H, Liu Z T, Dai HW. Web news oriented event multielements retrieval. Journal of Software, 2013, 2366-378

    Google Scholar 

  24. Wu P B, Chen Q X, Ma L. Study on intelligent retrieval of event relevant documents based on event frame. Journal of Chinese Information Processing, 2003, 17 (6): 25–30

    Google Scholar 

  25. Fu T J, Abbasi A, Chen H C. A focused crawler for dark Web forums. Journal of the American Society for Information Science and Technology, 2010, 61 (6): 1213–1231

    Google Scholar 

  26. Yang L Y, Li H J, Zhang Y K. The research on classification system of accidental news corpus. In: Proceedings of the 25th Conference on Frontier and Progress of Chinese Information Processing. 2006, 403–409

    Google Scholar 

  27. Menczer F, Pant G, Srinivasan P. Topical web crawlers: evaluating adaptive algorithms. ACM Transactions on Internet Technology, 2004, 4 (4): 378–419

    Article  Google Scholar 

  28. Martinez-Romo J, Araujo L. Updating broken Web links: An automatic recommendation system. Information Processing and Management, 2012, 48 (2): 183–203

    Article  Google Scholar 

  29. Melanie N, Markus N, Rudolf M, Bianka T. Focused crawling for building Web comment corpora. In: Proceedings of the 10th IEEE Consumer Communications and Networking Conference. 2013, 685–688

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhaoman Zhong.

Additional information

Zhaoman Zhong received his PhD in Computer Science from Shanghai University, China in 2011. His research interests are information retrieval and artificial intelligence.

Zongtian Liu received his MS from Beijing University of Aeronautics and Astronautics, China in 1982. His research interests are artificial intelligence and software engineering.

Yun Hu received her PhD in Computer Science from Nanjing University, China in 2013. Her research interests are social network analysis and artificial intelligence.

Cunhua Li received his PhD from Southeast University, China in 2007. His research interests are data mining and knowledge engineering.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhong, Z., Liu, Z., Hu, Y. et al. Efficient multi-event monitoring using built-in search engines. Front. Comput. Sci. 10, 281–291 (2016). https://doi.org/10.1007/s11704-015-4432-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-015-4432-3

Keywords

Navigation