Abstract
Query-directed browsing of unstructured Web-texts using Formal Concept Analysis (FCA) confronts two problems. Firstly on-line Web-data is sometimes unstructured and any FCA-system must include additional mechanisms to structure input sources. Secondly many online collections are large and dynamic so a Web-robot must be used to automatically extract data. These issues are addressed in this paper. We report on the construction of a Web-based FCA system for browsing classified advertisements for real-estate properties 1. Real-estate advertisements were chosen because they are typical of semi-structured textual information sources accessible on the Web. Furthermore, the analysis of real-estate data using FCA is a classic example used in introductory courses on FCA. However, unlike the classic FCA real-estate example, whose input is a structure relational database, we automatically mine Web-based texts for their structure.
In FCA the word property has a meaning similar to attribute. In this paper property is only be used with the meaning of real-estate property, e.g. a house or apartment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cole, R and P. Eklund, Analyzing an Email Collection using Formal Concept Analysis, European Conf. on Knowledge and Data Discovery, PKDD’99. pp. 309–315, LNAI 1704, 1999. Springer Verlag, 1999.
Cole, R. and G. Stumme: CEM: A Conceptual Email Manager Proceeding of the 8th International Conf. on Conceptual Structures, ICCS’00, LNAI 1867, pp. 438-452, Springer Verlag, 2000.
Cole, R, P. Eklund, and G. Stumme, CEM — A Program for Visualization and Discovery in Email, In D.A. Zighed, J. Komorowski, J. Zytkow (Eds), Proc. of PKDD’00, LNAI 1910, pp. 367–374, Springer-Verlag, 2000
Cole, R., P. Eklund and Å. Strand, Recovering Structure from Unstructured Webaccessible Rental Classifieds, Proceedings of the 4th Australian Document Computing Symposium (ADCS’00), DSTC Pty Ltd, pp. 131–140, 2000.
Ganter, B and R. Wille: Formal Concept Analysis: Mathematical Foundations Springer Verlag, 1999.
E. Horvitz, Uncertainty, Action and Interaction: In pursuit of Mixed-initiative Computing Intelligent Systems IEEE, pp. 17–20, September, 1999 http://research.microsoft.com/horvitz/mixedinit.HTM
Vogt, F. C. Wachter and R. Wille, Data Analysis based on a Conceptual File, Classification, Data Analysis and Knowledge Organization, Hans-Hermann Bock, W. Lenski and P. Ihm (Eds), pp. 131–140, Springer Verlag, Berlin, 1991.
Vogt, F. and R. Wille, TOSCANA: A Graphical Tool for Analyzing and Exploring Data In: R. Tamassia, I.G. Tollis (Eds) Graph Drawing’ 94, LNCS 894 pp. 226–233, 1995.
Wille, R. Conceptual Landscapes of Knowledge: A Pragmatic Paradigm for Knowledge Processing In: W. Gaul, H. Locarek-Junge (Eds) Classification in the Information Age, Springer, Heidelberg, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cole, R., Eklund, P. (2001). Browsing Semi-structured Web texts using Formal Concept Analysis. In: Delugach, H.S., Stumme, G. (eds) Conceptual Structures: Broadening the Base. ICCS 2001. Lecture Notes in Computer Science(), vol 2120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44583-8_23
Download citation
DOI: https://doi.org/10.1007/3-540-44583-8_23
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42344-7
Online ISBN: 978-3-540-44583-8
eBook Packages: Springer Book Archive