Abstract
We describe a search robot (crawler) intended to collect information regarding outgoing hyperlinks from a given set of web sites related to a certain topic. The crawler’s adaptive behavior is formulated in terms of a multi-armed bandit problem. Our experiments show that the choice of an adaptive algorithm for the crawler’s rational behavior depends on the actual topic of the underlying set of web sites.
Similar content being viewed by others
References
Voronin, A.V. and Pechnikov, A.A., Studying the Governmental Web Sites of the Republic of Karelia, Vek Kachestva, 2010, no. 3, pp. 28–30.
Pechnikov, A.A., Methods for Studying Regulated Thematic Fragments of the Web, Tr. Inst. Sist. Anal. Ross. Akad. Nauk, Ser. Prikl. Probl. Upravlen. Makrosist., 2010, vol. 59, pp. 134–145.
Pechnikov, A.A., A Model of University Web, Vestn. Nizhegor. Univ. im. N.I. Lobachevskogo, 2010, no. 6, pp. 208–214.
Pechnikov, A.A., Studying the Connections between Web Sites of Scientific Libraries in Russian Universities, Distantsion. Virtual. Obuchen., 2011, no. 7, pp. 13–24.
Sovetov, B.Ya. and Yakovlev, S.A., Modelirovanie sistem (Systems Modeling), Moscow: Vysshaya Shkola, 2001.
Auer, P., Cesa-Bianchi, N., and Fisher, P., Finite-Time Analysis of the Multiarmed Bandit Problem, Machine Learning, 2002, no. 47, pp. 235–256.
Fielding, R., Gettys, J., Mogul, J., Nielsen, H., Masinter, L., Leach, P., and Berners-Lee, T., RFC 2616: Hypertext Transfer Protocol-http/1.1, June 1999, URL: http://www.ietf.org/rfc/rfc2616.txt (07.12.2011).
Mahajan, A. and Teneketzis, T., Multi-Armed Bandit Problems, in Foundations and Applications of Sensor Management, Hero, A., Castanon, D., Cochran, D., and Rastella, K., Eds., New York: Springer, 2008, pp. 121–151.
Pant, G., Srinivasan, P., and Menczer, F., Crawling the Web, in Web Dynamics, Levene, M. and Poulovassilis, A., Eds., New York: Springer, 2004, pp. 153–178.
Sang Ho Lee and Sung Jin Kim, On URL Normalization, in Lecture Notes in Computer Science, 2005, vol. 47, pp. 1076–1085.
Tackseung, J., A Survey on the Bandit Problem with Switching Costs, De Economist, 2004, vol. 152, pp. 513–541.
Thelwall, M., Link Analysis: An Information Science Approach, Amsterdam: Elsevier, 2004.
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © A.A. Pechnikov, D.I. Chernobrovkin, 2012, published in Upravlenie Bol’shimi Sistemami, 2012, No. 36, pp. 301–318.
Rights and permissions
About this article
Cite this article
Pechnikov, A.A., Chernobrovkin, D.I. Adaptive crawler for external hyperlinks search and acquisition. Autom Remote Control 75, 587–593 (2014). https://doi.org/10.1134/S0005117914030151
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0005117914030151