Applying Software Analysis Technology to Lightweight Semantic Markup of Document Text

Kiyavitskaya, Nadzeya; Zeni, Nicola; Cordy, James R.; Mich, Luisa; Mylopoulos, John

doi:10.1007/11551188_65

Nadzeya Kiyavitskaya²⁰,
Nicola Zeni²⁰,
James R. Cordy²¹,
Luisa Mich²⁰ &
…
John Mylopoulos^20,22

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3686))

Included in the following conference series:

International Conference on Pattern Recognition and Image Analysis

1890 Accesses

Abstract

Software analysis techniques, and in particular software “design recovery”, have been highly successful at both technical and businesslevel semantic markup of large scale software systems written in a wide variety of programming languages, and in particular have proven e.cient and scalable in assisting the resolution of the “year 2000” problem for billions of lines of legacy source code. In this work we describe a first experiment in applying the same technical solutions and tools that have proven so successful in software markup to the more general problem of semantic markup of text documents. In this early report we describe our adaptation of the software analysis techniques, propose a general domain-independent architecture for semantic markup using them, and demonstrate its feasibility in a limited but realistic domain of application by comparison with both raw and tool-assisted human semantic markers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Interactive Near Duplicate Search in Software Documentation

Article 01 November 2019

A Type-Directed Approach to Program Repair

TEKNO: Preparing Legacy Technical Documents for Semantic Information Systems

References

Daconta, L., Orbst, L., Smith, K.: The Semantic Web: A guide to the future of XML, web services and knowledge management (2003)
Google Scholar
Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American 284, 34–43 (2001)
Article Google Scholar
Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., McCurley, K., Rajagopalan, S., Tomkins, A., Tomlin, J., Zien, J.: A case for automated large-scale semantic annotation. J. Web Semantics 1, 115–132 (2003)
Google Scholar
Biggerstaff, T.: Design recovery for maintenance and reuse. IEEE Computer 22, 36–49 (1989)
Google Scholar
Dean, T., Cordy, J., Schneider, K., Malton, A.: Experience using design recovery techniques to transform legacy systems. In: Proc. 17th Int. Conference on Software Maintenance, pp. 622–631 (2001)
Google Scholar
Cordy, J., Dean, T., Malton, A., Schneider, K.: Source transformation in software engineering using the TXL transformation system. J. Information and Software Technology 44, 827–837 (2002)
Article Google Scholar
Cordy, J.: TXL – a language for programming language tools and applications. In: Proc. 4th Int. Workshop on Language Descriptions, Tools and Applications, Electronic Notes in Theoretical Computer Science, vol. 110, pp. 3–31 (2004)
Google Scholar
Dean, T., Cordy, J., Malton, A., Schneider, K.: Agile parsing in TXL. J. Automated Software Engineering 10, 311–336 (2003)
Article Google Scholar
Cordy, J., Schneider, K., Dean, T., Malton, A.: HSML: Design-directed source code hotspots. In: Proc. 9th International Workshop on Program Comprehension, pp. 145–154 (2001)
Google Scholar
Yang, Y.: An evaluation of statistical approaches to text categorization. J. Information Retrieval 1, 67–88 (1999)
Google Scholar
Sean, L., Lee, S., Rager, D., Handler, J.: Ontology-based web agents. In: Proc. 1st International Conference on Autonomous Agents, pp. 59–68 (1997)
Google Scholar
Decker, S., Erdmann, M., Fensel, D., Studer, R.: Ontobroker: Ontology-based access to distributed and semi-structured information. In: Proc. 8th Working Conference on Database Semantics, pp. 351–369 (1999)
Google Scholar
Kogut, P., Holmes, W.: AeroDAML: Applying information extraction to generate DAML annotations from web pages. In: Proc. KCAP-2001 Workshop on Knowledge Markup and Semantic Annotation (2001)
Google Scholar
Popov, B., Kiryakov, A., Ognyanoff, D., Manov, D., Kirilov, A.: KIM: a semantic platform for information extaction and retrieval. J. Web Semantics 10, 375–392 (2004)
Google Scholar
Handschuh, S., Staab, S., Ciravegna, F.: S-CREAM: Semi-automatic CREAtion of Metadata. In: Proc. 13th Int. Conference on Knowledge Engineering and Management, pp. 358–372 (2002)
Google Scholar
Muslea, I.: Extraction patterns for information extraction tasks: A survey. In: Proc. AAAI 1999 Workshop on Machine Learning for Information Extraction, pp. 1–6 (1999)
Google Scholar
Nobata, C., Sekine, S.: Towards automatic acquisition of patterns for information extraction. In: Proc. International Conference on Computer Processing of Oriental Languages (1999)
Google Scholar
Etzioni, O., Cafarella, M.J., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence 165, 91–134 (2005)
Article Google Scholar
Wessman, A., Liddle, S.W., Embley, D.W.: A generalized framework for an ontology-based data-extraction system. In: Proc. 4th Int. Conference on Information Systems Technology and its Applications, pp. 239–253 (2005)
Google Scholar
Muslea, I., Minton, S., Knoblock, C.A.: Active learning with strong and weak views: A case study on wrapper induction. In: Proc. 18th Int. Joint Conference on Artificial Intelligence, pp. 415–420 (2003)
Google Scholar
Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: Proc. 17th National Conference on Artificial Intelligence, pp. 577–583 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Information and Communication Technology, University of Trento, Italy
Nadzeya Kiyavitskaya, Nicola Zeni, Luisa Mich & John Mylopoulos
ITC-IRST, Trento, Italy, and School of Computing, Queens University, Canada
James R. Cordy
Dept. of Computer Science, University of Toronto, Canada
John Mylopoulos

Authors

Nadzeya Kiyavitskaya
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Zeni
View author publications
You can also search for this author in PubMed Google Scholar
James R. Cordy
View author publications
You can also search for this author in PubMed Google Scholar
Luisa Mich
View author publications
You can also search for this author in PubMed Google Scholar
John Mylopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research School of Infomatics, Loughborough, UK
Sameer Singh
ATR Lab, Research School of Informatics, University of Loughborough, Loughborough, UK
Maneesha Singh
IBM Corporation, 1133 Wetchester Avenue, White Plains, 10604, New York, United States
Chid Apte
Institute of Computer Vision and applied Computer Sciences, IBaI, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kiyavitskaya, N., Zeni, N., Cordy, J.R., Mich, L., Mylopoulos, J. (2005). Applying Software Analysis Technology to Lightweight Semantic Markup of Document Text. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds) Pattern Recognition and Data Mining. ICAPR 2005. Lecture Notes in Computer Science, vol 3686. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551188_65

Download citation

DOI: https://doi.org/10.1007/11551188_65
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28757-5
Online ISBN: 978-3-540-28758-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics