Abstract
In this paper the problem of indexing heterogeneous structured documents and of retrieving semi-structured documents is considered. We propose a flexible paradigm for both indexing such documents and formulating user queries specifying soft constraints on both documents’ structure and content. At the indexing level we propose a model that achieves flexibility by constructing personalised document representations based on users’ views of the documents. This is obtained by allowing users to specify their preferences on the documents’ sections that they estimate to bear the most interesting information, as well as to linguistically quantify the number of sections which determine the global potential interest of the documents. At the query language level, a flexible query language for expressing soft selection conditions on both the documents’ structure and content is proposed.
Article PDF
Similar content being viewed by others
References
Bordogna G and Pasi G (2000) Flexible representation and querying of heterogeneous structured documents. Kibernetika, 36(6):617–633.
Bordogna G and Pasi G (1995) Controlling retrieval through a user adaptive representation of documents. Int. J. of Approximate reasoning, 12:317–339.
Callan J (1994) Passage-Level Evidence in Document Retrieval. ACM SIGIR, Dublin, pp. 302–310.
Chiaramella Y (1997) Browsing and querying: Two complementary approaches for multimedia information retrieval, Hypermedia–Information Retrieval–Multimedia, Dortmund, Germany.
Chiaramella Y (2000) Information retrieval and structured documents. In: Agosti M, Crestani F. and Pasi G. Eds., Lectures on Information Retrieval, series Lecture Notes in Computer Science, Springer Verlag.
Chiaramella Y, Mulhem P and Fourel F (1996) A model for multimedia information retrieval. Technical Report Fermi ESPRIT BRA 8134, University of Glasgow.
Crestani F and Pasi G (2000) Eds. Soft Computing in Information Retrieval: Techniques and Applications. Physica Verlag, series Studies in Fuzziness.
Dubois D and Prade A (1985) A review of fuzzy sets aggregation connectives. Information Sciences, 3:85–121.
Florescu D, Manolescu I and Kossmann D (1999) Storing and querying XML data using an RDBMS. IEEE data engineering bulletin, 22(3):27–34.
Fodor JC and Rubens M (1994) Fuzzy preference modelling and multicriteria decision support. Kluwer Academic Publisher, Dordrecht.
Frisse M (1988) Searching for information in a hypertext medical handbook. Communication of the ACM, 31(7): 880–886.
Fuhr N and Groβ johann K (2001) XIRQL: A query language for information retrieval in XML documents. In: Proc. of the 24th ACM-SIGIR, New Orleans.
Kaszkiel M and Zobel J (1997) Passage retrieval revisited. In: Belkin NJ, Narasimhalu D and Willett P, Eds., In Proc. of the 20th ACM-SIGIR, pp. 178–185.
Kazai G, Lalmas M and Roelleke T (2001) A model for the representation and focussed retrieval of structured documents based on fuzzy aggregation. In: Proc. of SPIRE, Chile.
Klir GJ and Folger TA (1988) Fuzzy Sets, Uncertainty and Information. Prentice Hall PTR Englewood Cliffs.
Kraft DH, Bordogna G and Pasi G (1999) Fuzzy set techniques in information retrieval, In: J Bezdek, Dubois D and H Prade Fuzzy Sets in Approximate Reasoning and Information Systems Eds., The Handbooks of Fuzzy Sets Series, Kluwer Academic Publishers, Part III, Chapt. 8:469–510.
Lalmas M and Ruthven I (1998) Representing and retrieving structured documents using the Dempster-Shafer theory of evidence: Modelling and evaluation. Journal of Documentation, 54(5):529–565.
Lalmas (1997) M Dempster-Shafer’s theory of evidence applied to structured documents: Modelling uncertainty. In: Proceedings of ACM SIGIR, Philadelphia, pp 110–118.
Macleod I (1990) Storage and retrieval of structured documents. Information processing and management, 26(2):197–208.
Myaeng S, Jang DH, Kim MS and Zhoo ZC (1998) A flexible model for retrieval of SGML documents. In: Proc. of the 21th ACM-SIGIR, Melbourne, Australia, pp. 138–145.
Navarro G and Baeza-Yates R (1995) A language for queries on structure and content of textual databases. In: Proceedings of ACM-SIGIR, Seattle, pp. 93–101.
Pasi G (2003) Modelling the notion of preference in information systems. International Journal of Intelligent Systems, to appear.
Salton G, Fox E and Wu H (1983) Extended Boolean information retrieval. Communications of the ACM, 26(12):1022–1036.
Salton G and McGill MJ (1984) Introduction to Modern Information Retrieval. McGraw-Hill Int. Book Co.
Sparck Jones KA (1972) A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11–20.
Van Rijsbergen CJ (1979) Information Retrieval. London, England, Butterworths & Co., Ltd.
Wilkinson R Effective Retrieval of Structured Documents (1994) In: Proc. of the17th ACM-SIGIR, Dublin, pp. 311–317.
Yager RR (1988) On ordered weighted averaging aggregation operators in multi criteria decision making. IEEE Trans. on Systems, Man and Cybernetics, 18(1):183–190.
Yager RR and Kacprzyk J (1997) Eds. The Ordered Weighted Averaging Operators: Theory and Applications. Kluwer Academic Publishers.
Zadeh LA (1983) A computational approach to fuzzy quantifiers in natural languages. Computing and Mathematics with Applications, 9:149–184.
Zadeh LA (1965) Fuzzy sets. Information and control, 8:338–353.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bordogna, G., Pasi, G. Personalised Indexing and Retrieval of Heterogeneous Structured Documents. Inf Retrieval 8, 301–318 (2005). https://doi.org/10.1007/s10791-005-5664-x
Issue Date:
DOI: https://doi.org/10.1007/s10791-005-5664-x