Formal Theory of Connectionist Web Retrieval

Dominich, Sándor; Skrop, Adrienn; Tuza, Zsolt

doi:10.1007/3-540-31590-X_9

Sándor Dominich⁵,
Adrienn Skrop⁵ &
Zsolt Tuza^5,6

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 197))

346 Accesses

Summary

The term soft computing refers to a family of techniques consisting of methods and procedures based on fuzzy logic, evolutionary computing, artificial neural networks, probabilistic reasoning, rough sets, chaotic computing. With the discovery that the Web is structured according to social networks exhibiting the small world property, the idea of using taxonomy principles has appeared as a complementary alternative to traditional keyword searching. One technique which has emerged from this principle was the “web-as-brain” metaphor. It is yielding new, associative, artificial neural networks- (ANN-) based retrieval techniques. The present paper proposes a unified formal framework for three major methods used for Web retrieval tasks: PageRank, HITS, I²R. The paper shows that these three techniques, albeit they stem originally from different paradigms, can be integrated into one unified formal view. The conceptual and notational framework used is given by ANNs and the generic network equation. It is shown that the PageRank, HITS and I²R methods can be formally obtained from the generic equation as different particular cases by making certain assumptions reflecting the corresponding underlying paradigm. The unified formal view sheds a new light upon the understanding of these methods: it may be said that they are only seemingly different from each other, they are particular ANNs stemming from the same equation and differing from one another in whether they are dynamic (a page’s importance varies in time) or static (a page’s importance is constant in time), and in the way they connect the pages to each other. The paper also gives a detailed mathematical analysis of the computational complexity of WTA-based IR techniques using the I²R method for illustration. The importance of this analysis consists in that it shows that (i) intuition may be misleading (contrary to intuition, a WTA-based algorithm yielding circles is not always “hard”), and (ii) this analysis can serve as a model that may be followed in the analysis of other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arasu, A. (2002). PageRank Computation and the Structure of the Web: Experiments and Algorithms. Proceedings of the World Wide Web 2002 Conference, Honolulu, Hawaii, USA, 7–11 May, http://www2002.org/CDROM/poster (visited: 4 Nov 2002)
Google Scholar
Bartell, B. T. (1994). Optimizing Ranking Functions: A Connectionist Approach to Adaptive Information Retrieval. Ph.D. Thesis, Department of Computer Science and Engineering, University of California, San Diego, 1994. http://www.cs.ucsd.edu/groups/guru/publications.html (visited: 10 May 2004)
Google Scholar
Belew, R.K. (1987). A Connectionist Approach to Conceptual Information Retrieval. Proceedings of the International Conference on Artificial Intelligence and Law (pp. 116–126). Baltimore, ACM Press.
Google Scholar
Belew, R.K. (1989). Adaptive information retrieval: Using a connectionist representation to retrieve and learn about documents. Proceedings of the SIGIR 1989 (pp. 11–20). Cambridge, MA, ACM Press.
Google Scholar
Bienner, F., Giuvarch, M. and Pinon, J.M. (1990). Browsing in hyperdocuments with the assistance of a neural network. Proceedings of the European Conference on Hypertext (pp. 288–297). Versailles, France.
Google Scholar
Brin, S., and Page, L. (1998). The Anatomy of a Large-Scale Hypertextual Web Search Engine. Proceedings of the 7th World Wide Web Conference, Brisbane, Australia, 14–18 April, pp: 107–117
Google Scholar
Chang, E. and Li, B. (2003). MEGA — The Maximizing Expected Generalization Algorithm for Learning Complex Query Concepts. ACM Transactions on Information Systems, 21(4), pp: 347–382.
Article MathSciNet Google Scholar
Chen, H. (2003a). Introduction to the JASIST special topic section on Web retrieval and mining: a machine learning perspective. Journal of the American Society for Information Science and Technology, vol. 54, no. 7, pp: 621–624.
Article Google Scholar
Chen, H. (2003b). Web retrieval and mining. Decision Support Systems, vol. 35, pp: 1–5.
Article Google Scholar
Chen, H., Fan, H., Chau, M., Zeng, D. (2001). MetaSpider: Meta-Searching and Categorisation on the Web. Journal of the American Society for Information Science and Technology, vol. 52, no. 13, pp: 1134–1147.
Article Google Scholar
Cheun, S. S. and Zakhor, A. (2001). Video Similarity Detection with Video Signature Clustering. Proceedings of the 8th IEEE International Conference on Image Processing, vol. 1. pp: 649–652.
Google Scholar
Cohen, P., and Kjeldson, R. (1987). Information retrieval by constrained spreading activation in semantic networks. Information Processing and Management, 23, 255–268.
Article Google Scholar
Cordon, O., Herrera-Viedma, E. (2003). Editorial: Special issue on soft computing applications to intelligent information retrieval. International Journal of Approximate Reasoning, vol. 34, pp: 89–95.
Article MATH Google Scholar
Crestani, F., Lee, P. L. (2000). Searching the web by constrained spreading activation. Information Processing and Management, vol. 36, pp: 585–605.
Article Google Scholar
Cunningham S.J., Holmes G., Littin J., Beale R., and Witten I.H. (1997). Applying connectionist models to information retrieval. In Amari, S. and Kasobov, N. (Eds.) Brain-Like Computing and Intelligent Information Systems (pp 435–457). Springer-Verlag.
Google Scholar
De Wilde, Ph. (1996). Neural Network Models. Springer Verlag.
Google Scholar
Ding, C., He, X., Husbands, P., Zha, H., Simon, H.D. (2002). PageRank, HITS, and a unified framework for link analysis. Proceedings of the ACM SIGIR 2002, Tampere, Finland, pp: 353–354.
Google Scholar
Dominich, S. (1994). Interaction Information Retrieval. Journal of Documentation, 50(3), 197–212.
Google Scholar
Dominich, S. (2001). Mathematical Foundations of Information Retrieval. Kluwer Academic Publishers, Dordrecht, Boston, London.
Google Scholar
Dominich, S. (2004). Connectionist Interaction Information retrieval. Information Processing and Management, vol 39, no.2, pp: 167–194
Article Google Scholar
Doszkocs, T., Reggia, J., and Lin, X. (1990). Connectionist models and information retrieval. Annual Review of Information Science & Technology, 25, 209–260.
Google Scholar
Feldman, J.A., and Ballard, D.H. (1982). Connectionist models and their properties. Cognitive Science, vol. 6, pp: 205–254
Article Google Scholar
Fuhr, N. and Buckley, C. (1991). A probabilistic learning approach for document indexing. ACM Transactions on Information Systems, 9(3), 223–248.
Article Google Scholar
Garfield, E. (1955). Citation indexes for science. Science, p. 108
Google Scholar
Grossberg, S. (1976). Adaptive pattern classification and universal recoding: I. Parallel development and coding of neural feature detectors. Biological Cybertnetics, vol. 23, pp: 121–134
Article MATH MathSciNet Google Scholar
Haveliwala, T.H. (1999). Efficient Computation of PageRank. Stanford University, http://dbpubs. stanford.edu:8090/pub/1998-31 (visited: 27 Febr 2004)
Google Scholar
Hopfield, J.J. (1984). Neurons with graded response have collective computational properties like those of two-states neurons. Proceedings of the National Academy of Sciences, vol. 81, pp: 3088–3092
Article Google Scholar
Huang, Z., Chen, H., Zeng, D. (2004). Applying Associative Retrieval Techniques to Alleviate the Sparsity Problem in Collaborative Filtering. ACM Transactions on Information Systems, vol. 22, no. 1, pp: 116–142.
Article Google Scholar
James, W. (1890). Psychology (Briefer Course). New York: Holt, Chapter XVI, “Association”, pp: 253–279
Google Scholar
Johnson, A., and Fotouhi, F. (1996). Adaptive clustering of hypermedia documents. Information Systems, 21, 549–473.
Article Google Scholar
Johnson, A., Fotouhi, F., and Goel, N. (1994). Adaptive clustering of scientific data. Proceedings of the 13th IEEE International Phoenix Conference on Computers and Communication (pp. 241–247). Tempe, Arizona.
Google Scholar
Kim, S.J., and Lee, S.H. (2002). An Improved Computation of the PageRank Algorithm. In: Crestani, F., Girolamo, M., and van Rijsbergen, C.J. (eds.) Proceedings of the European Colloquium on Information Retrieval. Springer LNCS 2291, pp: 73–85
Google Scholar
Kleiberg, J. M. (1999). Authoritative Sources in a Hyperlinked Environment. Journal of the ACM, vol. 46, no. 5, pp: 604–632.
Article MathSciNet Google Scholar
Kohonen, T. (1988). Self-Organization and Associative Memory. New York: Springer Verlag.
Google Scholar
Kraft, D.H., Bordogna, P. and Pasi, G. (1998). Fuzzy Set Techniques in Information Retrieval. In: Didier, D. and Prade, H. (Eds.) Handbook of Fuzzy Sets and Possibility Theory. Approximate Reasoning and Fuzzy Infomation Systems, (Chp. 8). Kluwer Academic Publishers, AA Dordrecht, The Netherlands.
Google Scholar
Kwok, K.L. (1989). A Neural Network for the Probabilistic Information Retrieval. In Belkin, N.J. and van Rijsbergen, C.J. (Eds.) Proceedings of the 12th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, Cambridge, MA, USA, pp: 21–29.
Google Scholar
Kwok, K.L. (1990). Application of Neural Networks to Information Retrieval. In Caudill, M. (Ed.) Proceedings of the International Joint Conference on Neural Networks, Vol. II (pp. 623–626). Hilldale, NJ, Lawrance Erlbaum Associates, Inc.
Google Scholar
Kwok, K.L. (1995). A network approach to probabilistic information retrieval. ACM Transactions on Information Systems, 13(3), 243–253.
Article Google Scholar
Layaida, R., Boughanem, M. and Caron, A. (1994). Constructing an Information Retrieval System with Neural Networks. Lecture Notes in Computer Science, 856, Springer, pp: 561–570.
Google Scholar
Lempel, R., Moran, S. (2001). SALSA: the stochastic approach for link-structure analysis. ACM Transactions on Information Systems, vol. 19, no. 2, pp: 131–160.
Article Google Scholar
Martin, W. T., Reissner, E. (1961). Elementary Differential Equations. Addison-Wesley, Reading-Massachusetts, U.S.A.
Google Scholar
Niki, K. (1997). Sel-organizing Information Retrieval System on the Web: Sir-Web. In Kasabov, N. et al. (Eds.) Progress in Connectionist-based Information Systems. Proceedings of the 1997 International Conference on Neural Information Processing and Intelligent Information Systems, vol. 2, Springer Verlag, Singapore, pp: 881–884.
Google Scholar
Orponen, P. (1995). Computational Complexity of Neural Networks: A Survey. Nordic Journal of Computing, vol. 1, pp: 94–110.
MathSciNet Google Scholar
Rose, D. E. (1994). A symbolic and connectionist approach to legal information retrieval. Hillsdale, NJ, Erlbaum.
Google Scholar
Rose, D.E. and Belew, R.K. (1991). A connectionist and symbolic hybrid for improving legal research. International Journal of Man-Machine Studies, 35(1), 1–33.
Google Scholar
Roussinov, D.G., Chen, H. (2001). Information navigation on the Web by clustering and summarizing query results. Information Processing and Management, vol. 37, pp: 789–816.
Article MATH Google Scholar
Ruiz, M.E., Srinivasan, P. (1999). Hierarchical Neural Networks for Text Categorization. Proceedings of the 22nd ACM SIGIR International Conference on Research and Development in Information Retrieval, Berkeley, California, USA, pp: 281–282.
Google Scholar
Schlieder, T. (2002). Schema-Driven Evaluation of ApproXQL Queries. Technical Report B02-01, Freie Universität Berlin, January 2002. http://www.inf.fuberlin. de/inst/ag-db/publications/2002/report-B-02-01.pdf (visited: 10 May 2004)
Google Scholar
Sheikholeslami, G., Chang, W. and Zhang, A. (2002). SemQuery: Semantic Clustering and Querying on Heterogeneous Features for Visual Data. IEEE Transactions on Knowledge and Data Engineering, 14(5), pp: 988–1003.
Article Google Scholar
Sima, J., Orponen, P. (2003). General-Purpose Computation with Neural Networks: A Survey of Complexity Theoretic Results. Neural Computation, vol. 15, pp: 2727–2778.
Article MATH Google Scholar
Van Rijsbergen, C.J. (2004). The Geometry of IR. Cambridge University Press.
Google Scholar
Weiss, M.A. (1995). Data Structures and Algorithm Analysis. The Benjamin/Cummings Publishing Company, Inc., New York, Amsterdam.
Google Scholar
Wermter S. (2000). Neural Network Agents for Learning Semantic Text Classification. Information Retrieval, 3(2), 87–103.
Article Google Scholar
Wong, S.K.M., Cai, Y.J. (1993). Computation of Term Association by Neural Networks. Proceedings of the 16th ACM SIGIR International Conference on Research and Development in Information Retrieval, Pittsburgh, PA, USA, pp: 107–115.
Google Scholar
Yang, C.C., Yen, J., Chen, H. (2000). Intelligent internet searching agent based on hybrid simulated annealing. Decision Support Systems, vol. 28, pp: 269–277.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Veszprém, Egyetem u. 10, 8200, Veszprém, Hungary
Sándor Dominich, Adrienn Skrop & Zsolt Tuza
Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary
Zsolt Tuza

Authors

Sándor Dominich
View author publications
You can also search for this author in PubMed Google Scholar
Adrienn Skrop
View author publications
You can also search for this author in PubMed Google Scholar
Zsolt Tuza
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and A.I E.T.S.I. Informatica, University of Granada, C/Periodista Daniel, Saucedo Aranda s/n, Granada, Spain
Enrique Herrera-Viedma
Department of Informatics Systems and Communication (DISCo), Università degli Studi di Milano Bicocca, Via Bicocca degli Arcimboldi, 8 (Edificio U7), 20126, Milano, Itay
Gabriella Pasi
Department of Computer and Information Sciences, University of Strathclyde, Livingstone Tower, 26 Richmond Street, Glasgow, G1 1XH, Scotland, UK
Fabio Crestani

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dominich, S., Skrop, A., Tuza, Z. (2006). Formal Theory of Connectionist Web Retrieval. In: Herrera-Viedma, E., Pasi, G., Crestani, F. (eds) Soft Computing in Web Information Retrieval. Studies in Fuzziness and Soft Computing, vol 197. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31590-X_9

Download citation

DOI: https://doi.org/10.1007/3-540-31590-X_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31588-9
Online ISBN: 978-3-540-31590-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics