Abstract
This paper presents a new algorithm for hypertext graph crawling. Using an ant as an agent in a hypertext graph significantly limits amount of irrelevant hypertext documents which must be downloaded in order to download a given number of relevant documents. Moreover, during all time of the crawling, artificial ants do not need a queue to central control crawling process. The proposed algorithm, called the Focused Ant Crawling Algorithm, for hypertext graph crawling, is better than the Shark-Search crawling algorithm and the algorithm with best-first search strategy utilizing a queue for the central control of the crawling process.
This work was partly supported by the Foundation for Polish Science (Professorial Grant 2005-2008) and the Polish State Committee for Scientific Research (Grant N516 020 31/1977), Special Research Project 2006-2009, Polish-Singapore Research Project 2008-2010, Research Project 2008-2010.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baldi, P., Frasconi, P., Smyth, P.: Modeling the Internet and the Web, Probabilistic Methods and Algorithms. Wiley, Chichester (2003)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)
Cortez, C., Vapnik, V.N.: The hybrid application of an inductive learning method and a neural network for intelligent information retrieval. Machine Learning 20, 1–25 (1995)
Kłopotek, A.M.: Intelligent Search Engines. EXIT (in polish) (2001)
Duch, W., Adamczak, R., Diercksen, G.H.F.: Classification, association and pattern completion using neural similarity based methods. International Journal of Applied Mathematic and Computer Science 10(4), 101–120 (2000)
Bilski, J.: The UD RLS algorithm for training feedforward neural networks. International Journal of Applied Mathematic and Computer Science 15(1), 115–123 (2005)
Łȩski, J., Henzel, N.: A neuro-fuzzy system based on logical interpretation of if-then rules. International Journal of Applied Mathematic and Computer Science 10(4), 703–722 (2000)
Łȩski, J.: A fuzzy if-then rule-based nonlinear classifier. International Journal of Applied Mathematic and Computer Science 13(2), 215–223 (2003)
Piegat, A.: Fuzzy Modeling and Control. Physica-Verlag (2001)
Rutkowska, D., Nowicki, R.: Implication-based neuro-fuzzy architectures. International Journal of Applied Mathematic and Computer Science 10(4), 675–701 (2000)
Dziwiński, P., Rutkowska, D.: Algorithm for generating fuzzy rules for WWW document classification. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M. (eds.) ICAISC 2006. LNCS (LNAI), vol. 4029, pp. 1111–1119. Springer, Heidelberg (2006)
Dziwiński, P., Rutkowska, D.: Hybrid algorithm for constructing DR-FIS to classification www documents. In: Some Aspects of Computer Science, EXIT Academic Publishing House, Warsaw (2007)
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice-Hall, Englewood Cliffs (1995)
Cho, J., Garcia-Molina, H., Page, L.: Efficient crawling through URL ordering. Computer Networks and ISDN Systems 30, 161–172 (1998)
Baeza-Yates, R., Castillo, C., Marin, M., Rodriguez, A.: Crawling a country: Better strategies than breadth-first for web page ordering. In: International Word Wide Web Conference (2005)
Chakrabarti, S., van den Berg, M., Dom, B.: Focused crawling: a new approach to topic-specific web resource discovery. Computer Networks (31), 1623–1640 (1999)
Diligenti, M., Coetzee, F.M., Lawrence, S., Giles, C.L., Gori, M.: Focused crawling using context graphs. In: 26th International Conference on Very Large Data Bases, pp. 527–534 (2000)
Davison, B.D.: Topical locality in the web. In: 23rd Ann. Int. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 272–279 (2000)
Rungsawang, A., Angkawattanawit, N.: Learnable topic-specific web crawler. Computer Applications 28, 97–114 (2005)
Hersovici, M., Jacovi, M., Maarek, Y., Pelleg, D., Shtalhaim, M., Ur, S.: The shark-search algorithm – an application: tailored web site mapping. In: 7th International World-Wide-Web Conference on Computer Networks, pp. 317–326 (1998)
De Bra, P., Post, R.: Information retrieval in the world wide web: making client-based searching feasible. Computer Networks and ISDN Systems 27(2), 183–192 (1994)
Dorigo, M., Gambardella, L.M.: Ant colony system: A cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation 1(1), 53–66 (1997)
Dorigo, M., Birattari, M., Stützle, T.: Ant colony optimization, artificial ants as a computational intelligence technique. IEEE Computational Intelligence Magazine, 28–39 (November 2006)
Pintea, C.M., Pop, P.C., Dumitrescu, D.: An ant-based technique for the dynamic generalized traveling salesman problem. In: 7th WSEAS International Conference on Systems Theory and Scientific Computation, vol. 7 (2007)
Vesel, A., Zerovnik, J.: How good can ants color graphs? Journal of Computing and Information Technology - CIT 8, 131–136 (2000)
Dowsland, K.A., Thompson, J.M.: An improved ant colony optimisation heuristic for graph coloring, vol. 156, pp. 313–324. Elsevier Science Publishers B. V (2008)
Altshuler, Y., Bruckstein, A., Wagner, I.: Swarm robotics for a dynamic cleaning problem. In: Swarm Intelligence Symposium, SIS 2005, pp. 209–216 (2005)
Wagner, I.A., Lindenbaum, M., Bruckstein, A.M.: Distributed covering by ant-robots using evaporating traces. IEEE Transactions on Robotics and Automation 15(5) (1999)
Wagner, I.A., Lindenbaum, M., Bruckstein, A.M.: Efficiently searching a graph by a smell-oriented vertex process. Annals of Mathematics and Artificial Intelligence 24, 211–223 (1998)
Birattari, M., Pellegrini, P., Dorigo, M.: On the invariance of ant colony optimization. IEEE Transactions on Evolutionary Computation 11(6) (2007)
Dorigo, M., Maniezzo, V., Colorni, A.: Ant system: Optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics – Part B 26(1), 29–41 (1996)
Yanowski, V., Wagner, I.A., Lindenbaum, M., Bruckstein, A.: A distributed ant algorithm for efficiently patrolling a network. Algorithmica 37, 165–186 (2003)
Mark, E.: Searching for information in a hypertext medical handbook. Communications of the ACM (31), 880–886 (1988)
Documentation for the Java Platform, Standard Edition (2008), http://java.sun.com/javase/reference/index.jsp
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dziwiński, P., Rutkowska, D. (2008). Ant Focused Crawling Algorithm. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing – ICAISC 2008. ICAISC 2008. Lecture Notes in Computer Science(), vol 5097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69731-2_96
Download citation
DOI: https://doi.org/10.1007/978-3-540-69731-2_96
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69572-1
Online ISBN: 978-3-540-69731-2
eBook Packages: Computer ScienceComputer Science (R0)