Abstract
The Web is the greatest information source in human history. Unfortunately, mining knowledge out of this source is a laborious and error-prone task. Many researchers believe that a solution to the problem can be founded on semantic annotations that need to be inserted in web-based documents and guide information extraction and knowledge mining. In this paper, we further elaborate a tool-supported process for semantic annotation of documents based on techniques and technologies traditionally used in software analysis and reverse engineering for large-scale legacy code bases. The outcomes of the paper include an experimental evaluation framework and empirical results based on two case studies adopted from the Tourism sector. The conclusions suggest that our approach can facilitate the semi-automatic annotation of large document bases.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Isakowitz, T., Bieber, M., Vitali, F.: Web Information Systems. Communications of the ACM 41(1), 78–80 (1998)
Cordy, J., Dean, T., Malton, A., Schneider, K.: Source transformation in software engineering using the TXL transformation system. Information and Software Technology Journal 44, 827–837 (2002)
Kiyavitskaya, N., Zeni, N., Cordy, J.R., Mich, L., Mylopoulos, J.: Applying Software Analysis Technology to Lightweight Semantic Markup of Document Text. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds.) ICAPR 2005. LNCS, vol. 3686, pp. 590–600. Springer, Heidelberg (2005)
Cordy, J.: TXL – a language for programming language tools and applications. In: Proc. of the 4th Int. Workshop on Language Descriptions, Tools and Applications. Electronic Notes in Theoretical Computer Science, vol. 110, pp. 3–31 (2004)
Dean, T., Cordy, J., Schneider, K., Malton, A.: Experience using design recovery techniques to transform legacy systems. In: Proc. 17 Int. Conf. on Software Maintenance, pp. 622–631 (2001)
Cordy, J., Schneider, K., Dean, T., Malton, A.: HSML: Design-directed source code hotspots. In: Proc. of the 9th Int. Workshop on Program Comprehension, pp. 145–154 (2001)
Yang, Y.: An evaluation of statistical approaches to text categorization. Journal of Information Retrieval 1(1/2), 67–88 (1999)
Feldman, R., Fresko, M., Hirsh, H., Aumann, Y., Liphstat, O., Schler, Y., Rajman, M.: Knowledge Management: A Text Mining Approach. In: Proc. of the 2nd International Conference on Practical Aspects of Knowledge Management (PAKM 1998), pp. 29–30 (1998)
Nahm, U.Y., Mooney, R.J.: Text Mining with Information Extraction. In: Proc. of the Spring Symposium on Mining Answers from Texts and Knowledge Bases, Stanford, CA, pp. 60–67 (2002)
Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., McCurley, K.S., Rajagopalan, S., Tomkins, A., Tomlin, J.A., Zien, J.Y.: A Case for Automated Large-Scale Semantic Annotation. Journal of Web Semantics 1(1), 115–132 (2003)
Kiryakov, A., Popov, B., Terziev, I., Manov, D., Ognyanoff, D.: Semantic Annotation, Indexing, and Retrieval. Journal of Web Semantics 2(1), 49–79 (2005)
Sheth, A., Bertram, C., Avant, D., Hammond, B., Kochut, K., Warke, Y.: Managing Semantic Content for the Web. IEEE Internet Computing 6(4), 80–87 (2002)
Nobata, C., Sekine, S.: Towards automatic acquisition of patterns for information extraction. In: Proc. of Int. Conf. on Computer Processing of Oriental Languages (1999)
Etzioni, O., Cafarella, M.J., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence 165, 91–134 (2005)
Wessman, A., Liddle, S.W., Embley, D.W.: A generalized framework for an ontology-based data-extraction system. In: Proc. of the 4th Int. Conf. on Information Systems Technology and its Applications, pp. 239–253 (2005)
Muslea, I., Minton, S., Knoblock, C.A.: Active learning with strong and weak views: A case study on wrapper induction. In: Proc. of the 18th Int. Joint Conf. on Artificial Intelligence, pp. 415–420 (2003)
Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: Proc. of the 17th National Conf. on Artificial Intelligence, pp. 577–583 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kiyavitskaya, N., Zeni, N., Mich, L., Cordy, J.R., Mylopoulos, J. (2006). Text Mining Through Semi Automatic Semantic Annotation. In: Reimer, U., Karagiannis, D. (eds) Practical Aspects of Knowledge Management. PAKM 2006. Lecture Notes in Computer Science(), vol 4333. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11944935_13
Download citation
DOI: https://doi.org/10.1007/11944935_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49998-5
Online ISBN: 978-3-540-49999-2
eBook Packages: Computer ScienceComputer Science (R0)