Abstract
The amount of data created in various sources over the Web is tremendously increasing. Trying to keep track of relevant sources is an increasingly time-consuming task. The traditional way of accessing information over the Web is pull-based. Users need to query data sources in certain time intervals where an important piece of information can be lately recognized or even missed completely. Technologies including RSS help users to get push-based notifications from websites. Discovering the relevant information without a notification overload is still not possible with existing technologies. Despite some promising efforts in push-based architectures to solve this problem, they fall short to meet the requirements in the big data era. In this study, by leveraging the latest advancements in distributed computing and big data analytics technologies, we use a focused crawling approach to propose a context aware notification architecture for people to find desired information at its most valuable state.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chakrabarti, S., Berg, M., Dom, B.: Focused crawling: a new approach to topic-specific web resource discovery. Comput. Netw. 31, 1623–1640 (1999)
Gaur, R., Sharma, D.K.: Review of ontology based focused crawling approaches. ICSCTET 2014 – International Conference Soft Computing Techniques for Engineering and Technology (2016)
Dey, A.K.: Understanding and using context. Pers. Ubiquit. Comput. (2001). http://dl.acm.org/citation.cfm?id=59357
Baldauf, M.: A survey on context-aware systems. Inf. Syst. 2(4) 2007
Cho, J., Garcia-Molina, H., Page, L.: Reprint of: efficient crawling through URL ordering. Comput. Netw. 56(18), 3849–3858 (2012)
Diligenti, M., Coetzee, F., Lawrence, S., Giles, C.L., Gori, M.: Focused crawling using context graphs. In: Proceedings of the 26th VLDB Conference, pp. 527–534 (2000)
Heydon, A., Najork, M.: Mercator: a scalable, extensible web crawler. World Wide Web 2, 219–229 (1999)
Boldi, P., Codenotti, B., Santini, M., Vigna, S.: UbiCrawler: a scalable fully distributed web crawler. Softw. - Pract. Exp. 34(8), 711–726 (2004)
Boldi, P., Marino, A., Santini, M., Vigna, S.: BUbiNG: massive crawling for the masses. In: International Conference on World Wide Web - WWW 2014 Companion, no. Ga 288956, pp. 227–228 (2014)
Yan, H., Wang, J., Li, X., Guo, L.: Architectural design and evaluation of an efficient web-crawling system. In: Proceedings - 15th International Parallel and Distributed Processing Symposium, IPDPS 2001, vol. 60, pp. 1824–1831 (2001)
Shkapenyuk, V.: Design and implementation of a high-performance distributed web crawler. Vladislav Shkapenyuk Torsten Suel, Department of Computer and Information Science. Technical report TR-CIS-2001-03, Design and Implementation of a High-Performance Distributed Web Crawle (2001)
Boanjak, M., Oliveira, E., Martins, J., Mendes Rodrigues, E., Sarmento, L.: TwitterEcho. In: Proceedings of 21st International Conference on Companion World Wide Web - WWW 2012 Companion, p. 1233 (2012)
Yakushev, A.V., Boukhanovsky, A.V., Sloot, P.M.A.: Topic crawler for social networks monitoring. Commun. Comput. Inf. Sci. 394, 214–227 (2013)
RSS 2.0 Specification. http://blogs.law.harvard.edu/tech/rss
Sia, K.C., Cho, J., Cho, H.-K.: Efficient monitoring algorithm for fast news alerts, pp. 1–12
Gusev, M., Ristov, S., Gushev, P., Velkoski, G.: Alert notification as a new model of internet-based transactions. In: 2014 22nd Telecommunications Forum, TELFOR 2014 - Proceedings of Papers (2015)
Katsiri, E.: A context-aware notification service
Corno, F., De Russis, L., Montanaro, T.: A context and user aware smart notification system. In: IEEE World Forum Internet Things, WF-IoT 2015 - Proceedings, pp. 645–651 (2016)
Bitly. https://bitly.com/
Google URL Shortener. https://goo.gl
Cattell, R.: Scalable SQL and NoSQL data stores. ACM SIGMOD Rec. 39(4), 12 (2011)
Apache HBase. https://hbase.apache.org/
Apache Cassandra. http://cassandra.apache.org/
Shvachko, K.: The hadoop distributed file system. In: IEEE 26th Symposium Mass Storage Systems and Technologies, pp. 1–10 (2010)
DC/OS. https://dcos.io/
Apache Mesos. http://mesos.apache.org/
Docker. https://www.docker.com/
Apache Kafka. https://kafka.apache.org/. Accessed 20 Feb 2017
Apache Flume. https://flume.apache.org/
RabbitMQ. https://www.rabbitmq.com/
Amazon Simple Queue Service (SQS). https://aws.amazon.com/sqs/
Apache Spark. http://spark.apache.org/
Apache Storm. http://storm.apache.org/
Apache Flink. https://flink.apache.org/. Accessed 20 Feb 2017
AMQP. https://www.amqp.org/
STOMP. http://stomp.github.io/
MQTT. http://mqtt.org/
Toshniwal, A., et al.: Storm@twitter. In: Proceedings of 2014 ACM SIGMOD International Conference on Management of Data - SIGMOD 2014, pp. 147–156 (2014)
Batsakis, S., Petrakis, E.G.M., Milios, E.: Improving the performance of focused web crawlers. Data Knowl. Eng. 68(10), 1001–1013 (2009)
Pant, G., Srinivasan, P.: Learning to crawl: comparing classification schemes. ACM Trans. Inf. Syst. 23(4), 430–462 (2005)
Luckham, D.: The power of events: an introduction to complex event processing in distributed enterprise systems. In: Bassiliades, N., Governatori, G., Paschke, A. (eds.) RuleML 2008. LNCS, vol. 5321, p. 3. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88808-6_2
Etzion, O., Niblett, P.: Event Processing in Action (2010). ISBN 9781935182214
Gokalp, M.O., Kocyigit, A., Eren, P.E.: A cloud based architecture for distributed real time processing of continuous queries. In: Proceedings - 41st Euromicro Conference Software Engineering and Advanced Applications, SEAA 2015, pp. 459–462 (2015)
Drools - Business Rules Management System. https://www.drools.org/
Apache SAMOA. https://samoa.apache.org/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Akyol, M.A., Gökalp, M.O., Kayabay, K., Eren, P.E., Koçyiğit, A. (2017). A Context Aware Notification Architecture Based on Distributed Focused Crawling in the Big Data Era. In: Themistocleous, M., Morabito, V. (eds) Information Systems. EMCIS 2017. Lecture Notes in Business Information Processing, vol 299. Springer, Cham. https://doi.org/10.1007/978-3-319-65930-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-65930-5_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65929-9
Online ISBN: 978-3-319-65930-5
eBook Packages: Computer ScienceComputer Science (R0)