Topic Crawler for Social Networks Monitoring

Yakushev, Andrei V.; Boukhanovsky, Alexander V.; Sloot, Peter M. A.

doi:10.1007/978-3-642-41360-5_17

Andrei V. Yakushev³,
Alexander V. Boukhanovsky³ &
Peter M. A. Sloot^3,4

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 394))

Included in the following conference series:

International Conference on Knowledge Engineering and the Semantic Web

1010 Accesses
6 Citations

Abstract

Paper describes a focused crawler for monitoring social networks which is used for information extraction and content analysis. Crawler implements MapReduce model for distributed computations and is oriented to big text data. Focused crawler allows to look for the pages classified as relevant to the specified topic. Classifier is build using knowledge database that defines words, their classes and rules of joining words into the phrases. Based on the weights of words and phrases the text weight which indicates relevance to the topic is obtained. This system was used to detect drug community in Russian segment of Livejournal social network. Official and slang drug terminology was implemented to develop knowledge database. Different aspects of knowledge database and classifier are studied. The non-homogeneous Poisson process was used to model blogs changing since it permits to build a monitoring policy that includes blogs update frequency and day-time effect. Evaluation on real data shows 25% increase in new posts detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lammel, R.: Google’s MapReduce programming model — Revisted. Science of Computer Programming 70, 1–30 (2007)
Article MathSciNet Google Scholar
White, T.: Hadoop: the definitive guide. O’Reilly Media, Yahoo! Press (2009)
Google Scholar
Cafarella, M., Cutting, D.: Building Nutch: open source search. ACM Queue 2(2), 54–61 (2004)
Article Google Scholar
Sia, K., Cho, J., Cho, H.: Efficient monitoring algorithm for fast news alerts. Knowledge and Data Engineering (2007)
Google Scholar
Cho, J., Garcia-Molina, H.: Effective page refresh policies for Web crawlers. ACM Transactions on Database Systems 28(4), 390–426 (2003)
Article Google Scholar
Ipeirotis, P.G., Agichtein, E., Gravano, L.: To Search or to Crawl? Towards a Query Optimizer for Text-Centric Tasks, pp. 265–276 (2006)
Google Scholar
Cho, J., Garcia-Molina, H.: Synchronizing a database to Improve Freshness, 1–30 (2000)
Google Scholar
Mityagin, S.A., et al.: Definition of target thresholds for drug-using indexes in respect to regional safety. Social Sciences (Obshestvennye nauki) 4, 243–251 (2012) (in Russian)
Google Scholar
Mityagin, S.A, Yakushev, A.V., Boukhanovsky, A.V.: Simulation of drug-spreading in population using social network monitoring. SISP Journal 2(10), 133–151 (2012) (in Russian)
Google Scholar
Simma, A., Jordan, M.: Modeling events with cascades of Poisson processes. Arxiv preprint arXiv:1203.3516 (2012)
Google Scholar
Bloehdorn, S., Hotho, A.: Boosting for Text Classification with Semantic Features. In: Mobasher, B., Nasraoui, O., Liu, B., Masand, B. (eds.) WebKDD 2004. LNCS (LNAI), vol. 3932, pp. 149–166. Springer, Heidelberg (2006)
Chapter Google Scholar
Hotho, A., Staab, S., Stumme, G.: Ontologies improve text document clustering. In: Third IEEE International Conference on Data Mining, ICDM 2003. IEEE (2003)
Google Scholar
Bloehdorn, S., Hotho, A.: Text classification by boosting weak learners based on terms and concepts. In: Fourth IEEE International Conference on Data Mining, ICDM 2004. IEEE (2004)
Google Scholar
Song, M.-H., Lim, S.-Y., Park, S.-B., Kang, D.-J., Lee, S.-J.: An automatic approach to classify web documents using a domain ontology. In: Pal, S.K., Bandyopadhyay, S., Biswas, S., et al. (eds.) PReMI 2005. LNCS, vol. 3776, pp. 666–671. Springer, Heidelberg (2005)
Chapter Google Scholar
Castells, P., Fernandez, M., Vallet, D.: An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval. IEEE Transactions on Knowledge and Data Engineering (2007)
Google Scholar
Chau, D.H., et al.: Parallel Crawling for Online Social Networks. In: Proceedings of the 16th International Conference on World Wide Web. ACM (2007)
Google Scholar
Boanjak, M., et al.: TwitterEcho: a distributed focused crawler to support open research with twitter data. In: Proceedings of the 21st International World Wide Web Conference (2012)
Google Scholar
Ravakhah, M., Kamyar, M.: Semantic Similarity Based Focused Crawling, Computational Intelligence, Communication Systems and Networks (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Saint-Petersburg National University of Information Technologies, Mechanics and Optics, Saint-Petersburg, Russia
Andrei V. Yakushev, Alexander V. Boukhanovsky & Peter M. A. Sloot
School of Computer Engineering (SCE), Nanyang Technological University (NTU), Singapore
Peter M. A. Sloot

Authors

Andrei V. Yakushev
View author publications
You can also search for this author in PubMed Google Scholar
Alexander V. Boukhanovsky
View author publications
You can also search for this author in PubMed Google Scholar
Peter M. A. Sloot
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Artificial Intelligence, University of Ulm, 89069, Ulm, Germany
Pavel Klinov
Intelligence Systems Laboratory, Saint Petersburg National Research University of Information Technologies, Mechanics and Optics, Kronverksky prospekt 49, office 380,, 197101, St. Petersburg, Russia
Dmitry Mouromtsev

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yakushev, A.V., Boukhanovsky, A.V., Sloot, P.M.A. (2013). Topic Crawler for Social Networks Monitoring. In: Klinov, P., Mouromtsev, D. (eds) Knowledge Engineering and the Semantic Web. KESW 2013. Communications in Computer and Information Science, vol 394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41360-5_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-41360-5_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41359-9
Online ISBN: 978-3-642-41360-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics