Skip to main content
Log in

Mining the meaningful term conjunctions from materialised faceted taxonomies: algorithms and complexity

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

A materialised faceted taxonomy is an information source where the objects of interest are indexed according to a faceted taxonomy. This paper shows how from a materialised faceted taxonomy, we can mine an expression of the Compound Term Composition Algebra that specifies exactly those compound terms (conjunctions of terms) that have non-empty interpretation. The mined expressions can be used for encoding in a very compact form (and subsequently reusing), the domain knowledge that is stored in existing materialised faceted taxonomies. A distinctive characteristic of this mining task is that the focus is given on minimising the storage space requirements of the mined set of compound terms. This paper formulates the problem of expression mining, gives several algorithms for expression mining, analyses their computational complexity, provides techniques for optimisation, and discusses several novel applications that now become possible.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bock HH, Diday E (2000) Analysis of symbolic data. Springer, Berlin Heidelberg New York [ISBN: 3-540-66619-2]

    Google Scholar 

  2. Clark P, Thompson J, Holmback P, Duncan L (2000) Exploiting a thesaurus-based semantic net for knowledge-based search. In: Proceedings of 12th conference on innovative applications of AI (AAAI/IAAI'00), pp 988–995

  3. Diday E (2002) An introduction to symbolic data analysis and the Sodas software. Electron J Symbolic Data Anal 1(1)

  4. Donini FM, Lenzerini M, Nardi D, Schaerf A (1996) Reasoning in description logics. In: Brewka G (ed) Principles of knowledge representation, Chap 1, CSLI Publications, Stanford, CA, pp 191–236

  5. Duncan EB (1989) A faceted approach to hypertext. In: McAleese R (ed) HYPERTEXT: theory into practice, BSP Intellect Books Location, UK, pp 157–163

  6. Haddad H (2003) French noun phrase indexing and mining for an information retrieval system. In: International Symposium on String Processing and Information Retrieval (SPIRE), Manaus, Brasil

  7. Hand DJ, Mannila H, Smyth P (2001) Principles of data mining. MIT Press, Cambridge, MA [ISBN: 1-57735-027-8]

    Google Scholar 

  8. Holland JH (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT Press, Cambridge, MA [ISBN: 0262581116]

    Google Scholar 

  9. Huffman D (1952) A method for the construction of minimum-redundancy codes. Proc. Inst. Radio Eng. (IRE) 40(9):1098–1101

    Google Scholar 

  10. ISO 2788-1986. (1986) Documentation–-guidelines for the establishment and development of monolingual thesauri. International Organization For Standardization

  11. Lindsay PH, Norman DA (1977) Human information processing. Academic Press, New York

    Google Scholar 

  12. Maple A (1995) Faceted access: a review of the literature, http://theme.music.indiana.edu/tech_s/mla/facacc.rev

  13. Papadimitriou C (1994) Computational complexity. Addison-Wesley, Reading, MA

    Google Scholar 

  14. Prieto-Diaz R (1989) Classification of reusable modules. In: Software reusability, vol I, Chap 4. ACM, New York, pp 99–123

  15. Prieto-Diaz R (1991) Implementing faceted classification for software reuse. Commun. ACM 34(5):88–97

    Google Scholar 

  16. Priss U, Jacob E (1999) Utilizing faceted structures for information systems design. In: Proceedings of the ASIS annual conference on knowledge: creation, organization, and use (ASIS'99) Washington, DC

  17. Pustejovsky J (1995) The generative lexicon. MIT Press, Cambridge, MA [ISBN: 0-262-16158-3]

    Google Scholar 

  18. Ranganathan SR (1965) The colon classification. In: Artandi S (ed) Rutgers series on systems for the intellectual organization of information, vol IV. Graduate School of Library Science, Rutgers University, New Brunswick, NJ

  19. Spyratos N, Tzitzikas Y, Christophides V (2002) On personalizing the catalogs of web portals. In: Proceedings of the 15th International FLAIRS Conference, FLAIRS'02, Pensacola, FL, pp 430-434

  20. Tzitzikas Y (2004) An algebraic method for compressing very large symbolic data tables. In: Proceedings of the Workshop on Symbolic and Spatial Data Analysis of ECML/PKDD 2004. Pisa, Italy

  21. Tzitzikas Y, Analyti A (2004) Mining the meaningful compound terms from materialised faceted taxonomies. In: Proceedings of the 3rd international conference on ontologies, databases and applications of semantics for large scale information systems, ODBASE'2004, Larnaca, Cyprus, pp 873–890

  22. Tzitzikas Y, Analyti A, Spyratos N (2003) The semantics of the compound term composition algebra. In: Proceedings of the 2nd international conference on ontologies, databases and applications of semantics, ODBASE'2003, Catania, Sicily, Italy, pp 970–985

  23. Tzitzikas Y, Analyti A, Spyratos N (2005) Compound term composition algebra: the semantics. LNCS J Data Semantics 2:58–84

    Google Scholar 

  24. Tzitzikas Y, Analyti A, Spyratos N, Constantopoulos P (2004) An algebraic approach for specifying compound terms in faceted taxonomies. In: Information modelling and knowledge bases XV, proceedings of the 13th European-Japanese conference on information modelling and knowledge bases, EJC'03. IOS Press, Amsterdam, pp 67–87

  25. Tzitzikas Y, Launonen R, Hakkarainen M, Kohonen P, Leppanen T, Simpanen E, Tornroos H, Uusitalo P, Vanska P (2004) FASTAXON: a system for FAST (and faceted) TAXONomy design." In: Proceedings of 23rd international conference on conceptual modeling, ER'2004, Shanghai, China (an on-line demo is available at http://fastaxon.erve.vtt.fi/)

  26. Tzitzikas Y, Meghini C (2003) Ostensive automatic schema mapping for taxonomy-based peer-to-peer systems. In: Proceedings of 7th international workshop on cooperative information agents, CIA-2003, Helsinki, Finland, pp 78–92 (Best Paper Award)

  27. Tzitzikas Y, Meghini C (2003) Query evaluation in peer-to-peer networks of taxonomy-based sources. In: Proceedings of 19th international conference on cooperative information Systems, CoopIS'2003, Catania, Sicily, Italy

  28. Tzitzikas Y, Spyratos N, Constantopoulos P (2004) Mediators over taxonomy-based information sources. VLDB J (in press)

  29. Vickery BC (1986) Knowledge representation: a brief review. J Document 42(3):145–159

    Google Scholar 

  30. WordNet. WordNet: a lexical database for the english language. Cognitive science laboratory, Princeton university (http://www.cogsci.princeton.edu/ wn)

  31. XFML: eXchangeable faceted metadata language, a. http://www.xfml.org

  32. XFML+CAMEL:compound term composition algebraically-motivated expression language, b. http://www.csi.forth.gr/markup/xfml+camel

  33. Ziv J, Lempel A (1977) A universal algorithm for sequential data compression. IEEE Trans Inform Theory 23(3):337–343

    Article  MathSciNet  Google Scholar 

  34. Ziv J, Lempel A (1978) Compression of individual sequences via variable-rate coding. IEEE Trans Inform Theory 24(5):530–536

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anastasia Analyti.

Additional information

Yannis Tzitzikas is currently Adjunct Professor in the Computer Science Department at University of Crete (Greece) and Visiting Researcher in Information Systems Lab at FORTH-ICS (Greece). Before joining University of Crete and FORTH-ICS, he was a postdoctoral fellow at the University of Namur (Belgium) and ERCIM postdoctoral fellow at ISTI-CNR (Pisa, Italy) and at VTT Technical Research Centre of Finland. He conducted his undergraduate and graduate studies (M.Sc., Ph.D.) in the Computer Science Department at University of Crete. His research interests fall in the intersection of the following areas: knowledge representation and reasoning, information indexing and retrieval, conceptual modeling, and collaborative distributed applications. His current research revolves around faceted metadata and semantics (theory and applications), the P2P paradigm (focusing on query evaluation algorithms and automatic schema integration techniques) and flexible interaction schemes for information bases. The results of his research are published in more than 30 papers in refereed international journals and conferences.

Anastasia Analyti earned a B.S. degree in Mathematics from University of Athens, Greece, and M.S. and Ph.D. degrees in Computer Science from Michigan State University, USA. She worked as a visiting professor at the Department of Computer Science, University of Crete, and at the Department of Electronic and Computer Engineering, Technical University of Crete. Since 1995, she has been a researcher at the Information Systems Laboratory of the Institute of Computer Science, Foundation for Research and Technology-Hellas (FORTH-ICS). Her current interests include the semantic Web, conceptual modelling, faceted metadata and semantics, rules for the semantic Web, biomedical ontologies, contextual organisation of information, contextual web-ontology languages, information integration and retrieval systems for the Web. She has published over 30 papers in refereed journals and conferences.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tzitzikas, Y., Analyti, A. Mining the meaningful term conjunctions from materialised faceted taxonomies: algorithms and complexity. Knowl Inf Syst 9, 430–467 (2006). https://doi.org/10.1007/s10115-005-0205-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-005-0205-x

Navigation