Skip to main content

Formal Theory of Connectionist Web Retrieval

  • Chapter
Soft Computing in Web Information Retrieval

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 197))

  • 346 Accesses

Summary

The term soft computing refers to a family of techniques consisting of methods and procedures based on fuzzy logic, evolutionary computing, artificial neural networks, probabilistic reasoning, rough sets, chaotic computing. With the discovery that the Web is structured according to social networks exhibiting the small world property, the idea of using taxonomy principles has appeared as a complementary alternative to traditional keyword searching. One technique which has emerged from this principle was the “web-as-brain” metaphor. It is yielding new, associative, artificial neural networks- (ANN-) based retrieval techniques. The present paper proposes a unified formal framework for three major methods used for Web retrieval tasks: PageRank, HITS, I2R. The paper shows that these three techniques, albeit they stem originally from different paradigms, can be integrated into one unified formal view. The conceptual and notational framework used is given by ANNs and the generic network equation. It is shown that the PageRank, HITS and I2R methods can be formally obtained from the generic equation as different particular cases by making certain assumptions reflecting the corresponding underlying paradigm. The unified formal view sheds a new light upon the understanding of these methods: it may be said that they are only seemingly different from each other, they are particular ANNs stemming from the same equation and differing from one another in whether they are dynamic (a page’s importance varies in time) or static (a page’s importance is constant in time), and in the way they connect the pages to each other. The paper also gives a detailed mathematical analysis of the computational complexity of WTA-based IR techniques using the I2R method for illustration. The importance of this analysis consists in that it shows that (i) intuition may be misleading (contrary to intuition, a WTA-based algorithm yielding circles is not always “hard”), and (ii) this analysis can serve as a model that may be followed in the analysis of other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arasu, A. (2002). PageRank Computation and the Structure of the Web: Experiments and Algorithms. Proceedings of the World Wide Web 2002 Conference, Honolulu, Hawaii, USA, 7–11 May, http://www2002.org/CDROM/poster (visited: 4 Nov 2002)

    Google Scholar 

  2. Bartell, B. T. (1994). Optimizing Ranking Functions: A Connectionist Approach to Adaptive Information Retrieval. Ph.D. Thesis, Department of Computer Science and Engineering, University of California, San Diego, 1994. http://www.cs.ucsd.edu/groups/guru/publications.html (visited: 10 May 2004)

    Google Scholar 

  3. Belew, R.K. (1987). A Connectionist Approach to Conceptual Information Retrieval. Proceedings of the International Conference on Artificial Intelligence and Law (pp. 116–126). Baltimore, ACM Press.

    Google Scholar 

  4. Belew, R.K. (1989). Adaptive information retrieval: Using a connectionist representation to retrieve and learn about documents. Proceedings of the SIGIR 1989 (pp. 11–20). Cambridge, MA, ACM Press.

    Google Scholar 

  5. Bienner, F., Giuvarch, M. and Pinon, J.M. (1990). Browsing in hyperdocuments with the assistance of a neural network. Proceedings of the European Conference on Hypertext (pp. 288–297). Versailles, France.

    Google Scholar 

  6. Brin, S., and Page, L. (1998). The Anatomy of a Large-Scale Hypertextual Web Search Engine. Proceedings of the 7th World Wide Web Conference, Brisbane, Australia, 14–18 April, pp: 107–117

    Google Scholar 

  7. Chang, E. and Li, B. (2003). MEGA — The Maximizing Expected Generalization Algorithm for Learning Complex Query Concepts. ACM Transactions on Information Systems, 21(4), pp: 347–382.

    Article  MathSciNet  Google Scholar 

  8. Chen, H. (2003a). Introduction to the JASIST special topic section on Web retrieval and mining: a machine learning perspective. Journal of the American Society for Information Science and Technology, vol. 54, no. 7, pp: 621–624.

    Article  Google Scholar 

  9. Chen, H. (2003b). Web retrieval and mining. Decision Support Systems, vol. 35, pp: 1–5.

    Article  Google Scholar 

  10. Chen, H., Fan, H., Chau, M., Zeng, D. (2001). MetaSpider: Meta-Searching and Categorisation on the Web. Journal of the American Society for Information Science and Technology, vol. 52, no. 13, pp: 1134–1147.

    Article  Google Scholar 

  11. Cheun, S. S. and Zakhor, A. (2001). Video Similarity Detection with Video Signature Clustering. Proceedings of the 8th IEEE International Conference on Image Processing, vol. 1. pp: 649–652.

    Google Scholar 

  12. Cohen, P., and Kjeldson, R. (1987). Information retrieval by constrained spreading activation in semantic networks. Information Processing and Management, 23, 255–268.

    Article  Google Scholar 

  13. Cordon, O., Herrera-Viedma, E. (2003). Editorial: Special issue on soft computing applications to intelligent information retrieval. International Journal of Approximate Reasoning, vol. 34, pp: 89–95.

    Article  MATH  Google Scholar 

  14. Crestani, F., Lee, P. L. (2000). Searching the web by constrained spreading activation. Information Processing and Management, vol. 36, pp: 585–605.

    Article  Google Scholar 

  15. Cunningham S.J., Holmes G., Littin J., Beale R., and Witten I.H. (1997). Applying connectionist models to information retrieval. In Amari, S. and Kasobov, N. (Eds.) Brain-Like Computing and Intelligent Information Systems (pp 435–457). Springer-Verlag.

    Google Scholar 

  16. De Wilde, Ph. (1996). Neural Network Models. Springer Verlag.

    Google Scholar 

  17. Ding, C., He, X., Husbands, P., Zha, H., Simon, H.D. (2002). PageRank, HITS, and a unified framework for link analysis. Proceedings of the ACM SIGIR 2002, Tampere, Finland, pp: 353–354.

    Google Scholar 

  18. Dominich, S. (1994). Interaction Information Retrieval. Journal of Documentation, 50(3), 197–212.

    Google Scholar 

  19. Dominich, S. (2001). Mathematical Foundations of Information Retrieval. Kluwer Academic Publishers, Dordrecht, Boston, London.

    Google Scholar 

  20. Dominich, S. (2004). Connectionist Interaction Information retrieval. Information Processing and Management, vol 39, no.2, pp: 167–194

    Article  Google Scholar 

  21. Doszkocs, T., Reggia, J., and Lin, X. (1990). Connectionist models and information retrieval. Annual Review of Information Science & Technology, 25, 209–260.

    Google Scholar 

  22. Feldman, J.A., and Ballard, D.H. (1982). Connectionist models and their properties. Cognitive Science, vol. 6, pp: 205–254

    Article  Google Scholar 

  23. Fuhr, N. and Buckley, C. (1991). A probabilistic learning approach for document indexing. ACM Transactions on Information Systems, 9(3), 223–248.

    Article  Google Scholar 

  24. Garfield, E. (1955). Citation indexes for science. Science, p. 108

    Google Scholar 

  25. Grossberg, S. (1976). Adaptive pattern classification and universal recoding: I. Parallel development and coding of neural feature detectors. Biological Cybertnetics, vol. 23, pp: 121–134

    Article  MATH  MathSciNet  Google Scholar 

  26. Haveliwala, T.H. (1999). Efficient Computation of PageRank. Stanford University, http://dbpubs. stanford.edu:8090/pub/1998-31 (visited: 27 Febr 2004)

    Google Scholar 

  27. Hopfield, J.J. (1984). Neurons with graded response have collective computational properties like those of two-states neurons. Proceedings of the National Academy of Sciences, vol. 81, pp: 3088–3092

    Article  Google Scholar 

  28. Huang, Z., Chen, H., Zeng, D. (2004). Applying Associative Retrieval Techniques to Alleviate the Sparsity Problem in Collaborative Filtering. ACM Transactions on Information Systems, vol. 22, no. 1, pp: 116–142.

    Article  Google Scholar 

  29. James, W. (1890). Psychology (Briefer Course). New York: Holt, Chapter XVI, “Association”, pp: 253–279

    Google Scholar 

  30. Johnson, A., and Fotouhi, F. (1996). Adaptive clustering of hypermedia documents. Information Systems, 21, 549–473.

    Article  Google Scholar 

  31. Johnson, A., Fotouhi, F., and Goel, N. (1994). Adaptive clustering of scientific data. Proceedings of the 13th IEEE International Phoenix Conference on Computers and Communication (pp. 241–247). Tempe, Arizona.

    Google Scholar 

  32. Kim, S.J., and Lee, S.H. (2002). An Improved Computation of the PageRank Algorithm. In: Crestani, F., Girolamo, M., and van Rijsbergen, C.J. (eds.) Proceedings of the European Colloquium on Information Retrieval. Springer LNCS 2291, pp: 73–85

    Google Scholar 

  33. Kleiberg, J. M. (1999). Authoritative Sources in a Hyperlinked Environment. Journal of the ACM, vol. 46, no. 5, pp: 604–632.

    Article  MathSciNet  Google Scholar 

  34. Kohonen, T. (1988). Self-Organization and Associative Memory. New York: Springer Verlag.

    Google Scholar 

  35. Kraft, D.H., Bordogna, P. and Pasi, G. (1998). Fuzzy Set Techniques in Information Retrieval. In: Didier, D. and Prade, H. (Eds.) Handbook of Fuzzy Sets and Possibility Theory. Approximate Reasoning and Fuzzy Infomation Systems, (Chp. 8). Kluwer Academic Publishers, AA Dordrecht, The Netherlands.

    Google Scholar 

  36. Kwok, K.L. (1989). A Neural Network for the Probabilistic Information Retrieval. In Belkin, N.J. and van Rijsbergen, C.J. (Eds.) Proceedings of the 12th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, Cambridge, MA, USA, pp: 21–29.

    Google Scholar 

  37. Kwok, K.L. (1990). Application of Neural Networks to Information Retrieval. In Caudill, M. (Ed.) Proceedings of the International Joint Conference on Neural Networks, Vol. II (pp. 623–626). Hilldale, NJ, Lawrance Erlbaum Associates, Inc.

    Google Scholar 

  38. Kwok, K.L. (1995). A network approach to probabilistic information retrieval. ACM Transactions on Information Systems, 13(3), 243–253.

    Article  Google Scholar 

  39. Layaida, R., Boughanem, M. and Caron, A. (1994). Constructing an Information Retrieval System with Neural Networks. Lecture Notes in Computer Science, 856, Springer, pp: 561–570.

    Google Scholar 

  40. Lempel, R., Moran, S. (2001). SALSA: the stochastic approach for link-structure analysis. ACM Transactions on Information Systems, vol. 19, no. 2, pp: 131–160.

    Article  Google Scholar 

  41. Martin, W. T., Reissner, E. (1961). Elementary Differential Equations. Addison-Wesley, Reading-Massachusetts, U.S.A.

    Google Scholar 

  42. Niki, K. (1997). Sel-organizing Information Retrieval System on the Web: Sir-Web. In Kasabov, N. et al. (Eds.) Progress in Connectionist-based Information Systems. Proceedings of the 1997 International Conference on Neural Information Processing and Intelligent Information Systems, vol. 2, Springer Verlag, Singapore, pp: 881–884.

    Google Scholar 

  43. Orponen, P. (1995). Computational Complexity of Neural Networks: A Survey. Nordic Journal of Computing, vol. 1, pp: 94–110.

    MathSciNet  Google Scholar 

  44. Rose, D. E. (1994). A symbolic and connectionist approach to legal information retrieval. Hillsdale, NJ, Erlbaum.

    Google Scholar 

  45. Rose, D.E. and Belew, R.K. (1991). A connectionist and symbolic hybrid for improving legal research. International Journal of Man-Machine Studies, 35(1), 1–33.

    Google Scholar 

  46. Roussinov, D.G., Chen, H. (2001). Information navigation on the Web by clustering and summarizing query results. Information Processing and Management, vol. 37, pp: 789–816.

    Article  MATH  Google Scholar 

  47. Ruiz, M.E., Srinivasan, P. (1999). Hierarchical Neural Networks for Text Categorization. Proceedings of the 22nd ACM SIGIR International Conference on Research and Development in Information Retrieval, Berkeley, California, USA, pp: 281–282.

    Google Scholar 

  48. Schlieder, T. (2002). Schema-Driven Evaluation of ApproXQL Queries. Technical Report B02-01, Freie Universität Berlin, January 2002. http://www.inf.fuberlin. de/inst/ag-db/publications/2002/report-B-02-01.pdf (visited: 10 May 2004)

    Google Scholar 

  49. Sheikholeslami, G., Chang, W. and Zhang, A. (2002). SemQuery: Semantic Clustering and Querying on Heterogeneous Features for Visual Data. IEEE Transactions on Knowledge and Data Engineering, 14(5), pp: 988–1003.

    Article  Google Scholar 

  50. Sima, J., Orponen, P. (2003). General-Purpose Computation with Neural Networks: A Survey of Complexity Theoretic Results. Neural Computation, vol. 15, pp: 2727–2778.

    Article  MATH  Google Scholar 

  51. Van Rijsbergen, C.J. (2004). The Geometry of IR. Cambridge University Press.

    Google Scholar 

  52. Weiss, M.A. (1995). Data Structures and Algorithm Analysis. The Benjamin/Cummings Publishing Company, Inc., New York, Amsterdam.

    Google Scholar 

  53. Wermter S. (2000). Neural Network Agents for Learning Semantic Text Classification. Information Retrieval, 3(2), 87–103.

    Article  Google Scholar 

  54. Wong, S.K.M., Cai, Y.J. (1993). Computation of Term Association by Neural Networks. Proceedings of the 16th ACM SIGIR International Conference on Research and Development in Information Retrieval, Pittsburgh, PA, USA, pp: 107–115.

    Google Scholar 

  55. Yang, C.C., Yen, J., Chen, H. (2000). Intelligent internet searching agent based on hybrid simulated annealing. Decision Support Systems, vol. 28, pp: 269–277.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Dominich, S., Skrop, A., Tuza, Z. (2006). Formal Theory of Connectionist Web Retrieval. In: Herrera-Viedma, E., Pasi, G., Crestani, F. (eds) Soft Computing in Web Information Retrieval. Studies in Fuzziness and Soft Computing, vol 197. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31590-X_9

Download citation

  • DOI: https://doi.org/10.1007/3-540-31590-X_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31588-9

  • Online ISBN: 978-3-540-31590-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics