Text Mining Through Semi Automatic Semantic Annotation

Kiyavitskaya, Nadzeya; Zeni, Nicola; Mich, Luisa; Cordy, James R.; Mylopoulos, John

doi:10.1007/11944935_13

Nadzeya Kiyavitskaya²⁰,
Nicola Zeni²⁰,
Luisa Mich²¹,
James R. Cordy²² &
…
John Mylopoulos²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4333))

Included in the following conference series:

International Conference on Practical Aspects of Knowledge Management

935 Accesses

Abstract

The Web is the greatest information source in human history. Unfortunately, mining knowledge out of this source is a laborious and error-prone task. Many researchers believe that a solution to the problem can be founded on semantic annotations that need to be inserted in web-based documents and guide information extraction and knowledge mining. In this paper, we further elaborate a tool-supported process for semantic annotation of documents based on techniques and technologies traditionally used in software analysis and reverse engineering for large-scale legacy code bases. The outcomes of the paper include an experimental evaluation framework and empirical results based on two case studies adopted from the Tourism sector. The conclusions suggest that our approach can facilitate the semi-automatic annotation of large document bases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

DCTracVis: a system retrieving and visualizing traceability links between source code and documentation

Article 11 July 2018

CodeOntology: RDF-ization of Source Code

System for extracting domain topic using link analysis and searching for relevant features

Article 20 September 2018

References

Isakowitz, T., Bieber, M., Vitali, F.: Web Information Systems. Communications of the ACM 41(1), 78–80 (1998)
Article Google Scholar
Cordy, J., Dean, T., Malton, A., Schneider, K.: Source transformation in software engineering using the TXL transformation system. Information and Software Technology Journal 44, 827–837 (2002)
Article Google Scholar
Kiyavitskaya, N., Zeni, N., Cordy, J.R., Mich, L., Mylopoulos, J.: Applying Software Analysis Technology to Lightweight Semantic Markup of Document Text. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds.) ICAPR 2005. LNCS, vol. 3686, pp. 590–600. Springer, Heidelberg (2005)
Chapter Google Scholar
Cordy, J.: TXL – a language for programming language tools and applications. In: Proc. of the 4th Int. Workshop on Language Descriptions, Tools and Applications. Electronic Notes in Theoretical Computer Science, vol. 110, pp. 3–31 (2004)
Google Scholar
Dean, T., Cordy, J., Schneider, K., Malton, A.: Experience using design recovery techniques to transform legacy systems. In: Proc. 17 Int. Conf. on Software Maintenance, pp. 622–631 (2001)
Google Scholar
Cordy, J., Schneider, K., Dean, T., Malton, A.: HSML: Design-directed source code hotspots. In: Proc. of the 9th Int. Workshop on Program Comprehension, pp. 145–154 (2001)
Google Scholar
Yang, Y.: An evaluation of statistical approaches to text categorization. Journal of Information Retrieval 1(1/2), 67–88 (1999)
Google Scholar
Feldman, R., Fresko, M., Hirsh, H., Aumann, Y., Liphstat, O., Schler, Y., Rajman, M.: Knowledge Management: A Text Mining Approach. In: Proc. of the 2nd International Conference on Practical Aspects of Knowledge Management (PAKM 1998), pp. 29–30 (1998)
Google Scholar
Nahm, U.Y., Mooney, R.J.: Text Mining with Information Extraction. In: Proc. of the Spring Symposium on Mining Answers from Texts and Knowledge Bases, Stanford, CA, pp. 60–67 (2002)
Google Scholar
Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., McCurley, K.S., Rajagopalan, S., Tomkins, A., Tomlin, J.A., Zien, J.Y.: A Case for Automated Large-Scale Semantic Annotation. Journal of Web Semantics 1(1), 115–132 (2003)
Google Scholar
Kiryakov, A., Popov, B., Terziev, I., Manov, D., Ognyanoff, D.: Semantic Annotation, Indexing, and Retrieval. Journal of Web Semantics 2(1), 49–79 (2005)
Google Scholar
Sheth, A., Bertram, C., Avant, D., Hammond, B., Kochut, K., Warke, Y.: Managing Semantic Content for the Web. IEEE Internet Computing 6(4), 80–87 (2002)
Article Google Scholar
Nobata, C., Sekine, S.: Towards automatic acquisition of patterns for information extraction. In: Proc. of Int. Conf. on Computer Processing of Oriental Languages (1999)
Google Scholar
Etzioni, O., Cafarella, M.J., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence 165, 91–134 (2005)
Article Google Scholar
Wessman, A., Liddle, S.W., Embley, D.W.: A generalized framework for an ontology-based data-extraction system. In: Proc. of the 4th Int. Conf. on Information Systems Technology and its Applications, pp. 239–253 (2005)
Google Scholar
Muslea, I., Minton, S., Knoblock, C.A.: Active learning with strong and weak views: A case study on wrapper induction. In: Proc. of the 18th Int. Joint Conf. on Artificial Intelligence, pp. 415–420 (2003)
Google Scholar
Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: Proc. of the 17th National Conf. on Artificial Intelligence, pp. 577–583 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Information and Communication Technology, University of Trento, Italy
Nadzeya Kiyavitskaya & Nicola Zeni
Dept. of Computer and Management Sciences, University of Trento, Italy
Luisa Mich
School of Computing, Queens University, Kingston, Canada
James R. Cordy
Dept. of Computer Science, University of Toronto, Ontario, Canada
John Mylopoulos

Authors

Nadzeya Kiyavitskaya
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Zeni
View author publications
You can also search for this author in PubMed Google Scholar
Luisa Mich
View author publications
You can also search for this author in PubMed Google Scholar
James R. Cordy
View author publications
You can also search for this author in PubMed Google Scholar
John Mylopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Information and Process Management, University of Applied Sciences St. Gallen, Teufener Strasse 2, 9000, St. Gallen, Switzerland
Ulrich Reimer
Faculty of Computer Science, Department of Knowledge and Business Engineering, University of Vienna, Bruenner Str. 72, 1210, Vienna, Austria
Dimitris Karagiannis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kiyavitskaya, N., Zeni, N., Mich, L., Cordy, J.R., Mylopoulos, J. (2006). Text Mining Through Semi Automatic Semantic Annotation. In: Reimer, U., Karagiannis, D. (eds) Practical Aspects of Knowledge Management. PAKM 2006. Lecture Notes in Computer Science(), vol 4333. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11944935_13

Download citation

DOI: https://doi.org/10.1007/11944935_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49998-5
Online ISBN: 978-3-540-49999-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics