Skip to main content

A Context Aware Notification Architecture Based on Distributed Focused Crawling in the Big Data Era

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 299))

Abstract

The amount of data created in various sources over the Web is tremendously increasing. Trying to keep track of relevant sources is an increasingly time-consuming task. The traditional way of accessing information over the Web is pull-based. Users need to query data sources in certain time intervals where an important piece of information can be lately recognized or even missed completely. Technologies including RSS help users to get push-based notifications from websites. Discovering the relevant information without a notification overload is still not possible with existing technologies. Despite some promising efforts in push-based architectures to solve this problem, they fall short to meet the requirements in the big data era. In this study, by leveraging the latest advancements in distributed computing and big data analytics technologies, we use a focused crawling approach to propose a context aware notification architecture for people to find desired information at its most valuable state.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Chakrabarti, S., Berg, M., Dom, B.: Focused crawling: a new approach to topic-specific web resource discovery. Comput. Netw. 31, 1623–1640 (1999)

    Article  Google Scholar 

  2. Gaur, R., Sharma, D.K.: Review of ontology based focused crawling approaches. ICSCTET 2014 – International Conference Soft Computing Techniques for Engineering and Technology (2016)

    Google Scholar 

  3. Dey, A.K.: Understanding and using context. Pers. Ubiquit. Comput. (2001). http://dl.acm.org/citation.cfm?id=59357

  4. Baldauf, M.: A survey on context-aware systems. Inf. Syst. 2(4) 2007

    Google Scholar 

  5. Cho, J., Garcia-Molina, H., Page, L.: Reprint of: efficient crawling through URL ordering. Comput. Netw. 56(18), 3849–3858 (2012)

    Article  Google Scholar 

  6. Diligenti, M., Coetzee, F., Lawrence, S., Giles, C.L., Gori, M.: Focused crawling using context graphs. In: Proceedings of the 26th VLDB Conference, pp. 527–534 (2000)

    Google Scholar 

  7. Heydon, A., Najork, M.: Mercator: a scalable, extensible web crawler. World Wide Web 2, 219–229 (1999)

    Article  Google Scholar 

  8. Boldi, P., Codenotti, B., Santini, M., Vigna, S.: UbiCrawler: a scalable fully distributed web crawler. Softw. - Pract. Exp. 34(8), 711–726 (2004)

    Article  Google Scholar 

  9. Boldi, P., Marino, A., Santini, M., Vigna, S.: BUbiNG: massive crawling for the masses. In: International Conference on World Wide Web - WWW 2014 Companion, no. Ga 288956, pp. 227–228 (2014)

    Google Scholar 

  10. Yan, H., Wang, J., Li, X., Guo, L.: Architectural design and evaluation of an efficient web-crawling system. In: Proceedings - 15th International Parallel and Distributed Processing Symposium, IPDPS 2001, vol. 60, pp. 1824–1831 (2001)

    Google Scholar 

  11. Shkapenyuk, V.: Design and implementation of a high-performance distributed web crawler. Vladislav Shkapenyuk Torsten Suel, Department of Computer and Information Science. Technical report TR-CIS-2001-03, Design and Implementation of a High-Performance Distributed Web Crawle (2001)

    Google Scholar 

  12. Boanjak, M., Oliveira, E., Martins, J., Mendes Rodrigues, E., Sarmento, L.: TwitterEcho. In: Proceedings of 21st International Conference on Companion World Wide Web - WWW 2012 Companion, p. 1233 (2012)

    Google Scholar 

  13. Yakushev, A.V., Boukhanovsky, A.V., Sloot, P.M.A.: Topic crawler for social networks monitoring. Commun. Comput. Inf. Sci. 394, 214–227 (2013)

    Google Scholar 

  14. RSS 2.0 Specification. http://blogs.law.harvard.edu/tech/rss

  15. Sia, K.C., Cho, J., Cho, H.-K.: Efficient monitoring algorithm for fast news alerts, pp. 1–12

    Google Scholar 

  16. Gusev, M., Ristov, S., Gushev, P., Velkoski, G.: Alert notification as a new model of internet-based transactions. In: 2014 22nd Telecommunications Forum, TELFOR 2014 - Proceedings of Papers (2015)

    Google Scholar 

  17. Katsiri, E.: A context-aware notification service

    Google Scholar 

  18. Corno, F., De Russis, L., Montanaro, T.: A context and user aware smart notification system. In: IEEE World Forum Internet Things, WF-IoT 2015 - Proceedings, pp. 645–651 (2016)

    Google Scholar 

  19. Bitly. https://bitly.com/

  20. Google URL Shortener. https://goo.gl

  21. Cattell, R.: Scalable SQL and NoSQL data stores. ACM SIGMOD Rec. 39(4), 12 (2011)

    Article  Google Scholar 

  22. Apache HBase. https://hbase.apache.org/

  23. Apache Cassandra. http://cassandra.apache.org/

  24. Shvachko, K.: The hadoop distributed file system. In: IEEE 26th Symposium Mass Storage Systems and Technologies, pp. 1–10 (2010)

    Google Scholar 

  25. DC/OS. https://dcos.io/

  26. Apache Mesos. http://mesos.apache.org/

  27. Docker. https://www.docker.com/

  28. Apache Kafka. https://kafka.apache.org/. Accessed 20 Feb 2017

  29. Apache Flume. https://flume.apache.org/

  30. RabbitMQ. https://www.rabbitmq.com/

  31. Amazon Simple Queue Service (SQS). https://aws.amazon.com/sqs/

  32. Apache Spark. http://spark.apache.org/

  33. Apache Storm. http://storm.apache.org/

  34. Apache Flink. https://flink.apache.org/. Accessed 20 Feb 2017

  35. AMQP. https://www.amqp.org/

  36. STOMP. http://stomp.github.io/

  37. MQTT. http://mqtt.org/

  38. Toshniwal, A., et al.: Storm@twitter. In: Proceedings of 2014 ACM SIGMOD International Conference on Management of Data - SIGMOD 2014, pp. 147–156 (2014)

    Google Scholar 

  39. Batsakis, S., Petrakis, E.G.M., Milios, E.: Improving the performance of focused web crawlers. Data Knowl. Eng. 68(10), 1001–1013 (2009)

    Article  Google Scholar 

  40. Pant, G., Srinivasan, P.: Learning to crawl: comparing classification schemes. ACM Trans. Inf. Syst. 23(4), 430–462 (2005)

    Article  Google Scholar 

  41. Luckham, D.: The power of events: an introduction to complex event processing in distributed enterprise systems. In: Bassiliades, N., Governatori, G., Paschke, A. (eds.) RuleML 2008. LNCS, vol. 5321, p. 3. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88808-6_2

    Chapter  Google Scholar 

  42. Etzion, O., Niblett, P.: Event Processing in Action (2010). ISBN 9781935182214

    Google Scholar 

  43. Gokalp, M.O., Kocyigit, A., Eren, P.E.: A cloud based architecture for distributed real time processing of continuous queries. In: Proceedings - 41st Euromicro Conference Software Engineering and Advanced Applications, SEAA 2015, pp. 459–462 (2015)

    Google Scholar 

  44. Drools - Business Rules Management System. https://www.drools.org/

  45. Esper. http://www.espertech.com/esper/

  46. Apache SAMOA. https://samoa.apache.org/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mehmet Ali Akyol .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Akyol, M.A., Gökalp, M.O., Kayabay, K., Eren, P.E., Koçyiğit, A. (2017). A Context Aware Notification Architecture Based on Distributed Focused Crawling in the Big Data Era. In: Themistocleous, M., Morabito, V. (eds) Information Systems. EMCIS 2017. Lecture Notes in Business Information Processing, vol 299. Springer, Cham. https://doi.org/10.1007/978-3-319-65930-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-65930-5_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-65929-9

  • Online ISBN: 978-3-319-65930-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics