Skip to main content

Applying Software Analysis Technology to Lightweight Semantic Markup of Document Text

  • Conference paper
Pattern Recognition and Data Mining (ICAPR 2005)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3686))

Included in the following conference series:

  • 1890 Accesses

Abstract

Software analysis techniques, and in particular software “design recovery”, have been highly successful at both technical and businesslevel semantic markup of large scale software systems written in a wide variety of programming languages, and in particular have proven e.cient and scalable in assisting the resolution of the “year 2000” problem for billions of lines of legacy source code. In this work we describe a first experiment in applying the same technical solutions and tools that have proven so successful in software markup to the more general problem of semantic markup of text documents. In this early report we describe our adaptation of the software analysis techniques, propose a general domain-independent architecture for semantic markup using them, and demonstrate its feasibility in a limited but realistic domain of application by comparison with both raw and tool-assisted human semantic markers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Daconta, L., Orbst, L., Smith, K.: The Semantic Web: A guide to the future of XML, web services and knowledge management (2003)

    Google Scholar 

  2. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American 284, 34–43 (2001)

    Article  Google Scholar 

  3. Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., McCurley, K., Rajagopalan, S., Tomkins, A., Tomlin, J., Zien, J.: A case for automated large-scale semantic annotation. J. Web Semantics 1, 115–132 (2003)

    Google Scholar 

  4. Biggerstaff, T.: Design recovery for maintenance and reuse. IEEE Computer 22, 36–49 (1989)

    Google Scholar 

  5. Dean, T., Cordy, J., Schneider, K., Malton, A.: Experience using design recovery techniques to transform legacy systems. In: Proc. 17th Int. Conference on Software Maintenance, pp. 622–631 (2001)

    Google Scholar 

  6. Cordy, J., Dean, T., Malton, A., Schneider, K.: Source transformation in software engineering using the TXL transformation system. J. Information and Software Technology 44, 827–837 (2002)

    Article  Google Scholar 

  7. Cordy, J.: TXL – a language for programming language tools and applications. In: Proc. 4th Int. Workshop on Language Descriptions, Tools and Applications, Electronic Notes in Theoretical Computer Science, vol. 110, pp. 3–31 (2004)

    Google Scholar 

  8. Dean, T., Cordy, J., Malton, A., Schneider, K.: Agile parsing in TXL. J. Automated Software Engineering 10, 311–336 (2003)

    Article  Google Scholar 

  9. Cordy, J., Schneider, K., Dean, T., Malton, A.: HSML: Design-directed source code hotspots. In: Proc. 9th International Workshop on Program Comprehension, pp. 145–154 (2001)

    Google Scholar 

  10. Yang, Y.: An evaluation of statistical approaches to text categorization. J. Information Retrieval 1, 67–88 (1999)

    Google Scholar 

  11. Sean, L., Lee, S., Rager, D., Handler, J.: Ontology-based web agents. In: Proc. 1st International Conference on Autonomous Agents, pp. 59–68 (1997)

    Google Scholar 

  12. Decker, S., Erdmann, M., Fensel, D., Studer, R.: Ontobroker: Ontology-based access to distributed and semi-structured information. In: Proc. 8th Working Conference on Database Semantics, pp. 351–369 (1999)

    Google Scholar 

  13. Kogut, P., Holmes, W.: AeroDAML: Applying information extraction to generate DAML annotations from web pages. In: Proc. KCAP-2001 Workshop on Knowledge Markup and Semantic Annotation (2001)

    Google Scholar 

  14. Popov, B., Kiryakov, A., Ognyanoff, D., Manov, D., Kirilov, A.: KIM: a semantic platform for information extaction and retrieval. J. Web Semantics 10, 375–392 (2004)

    Google Scholar 

  15. Handschuh, S., Staab, S., Ciravegna, F.: S-CREAM: Semi-automatic CREAtion of Metadata. In: Proc. 13th Int. Conference on Knowledge Engineering and Management, pp. 358–372 (2002)

    Google Scholar 

  16. Muslea, I.: Extraction patterns for information extraction tasks: A survey. In: Proc. AAAI 1999 Workshop on Machine Learning for Information Extraction, pp. 1–6 (1999)

    Google Scholar 

  17. Nobata, C., Sekine, S.: Towards automatic acquisition of patterns for information extraction. In: Proc. International Conference on Computer Processing of Oriental Languages (1999)

    Google Scholar 

  18. Etzioni, O., Cafarella, M.J., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence 165, 91–134 (2005)

    Article  Google Scholar 

  19. Wessman, A., Liddle, S.W., Embley, D.W.: A generalized framework for an ontology-based data-extraction system. In: Proc. 4th Int. Conference on Information Systems Technology and its Applications, pp. 239–253 (2005)

    Google Scholar 

  20. Muslea, I., Minton, S., Knoblock, C.A.: Active learning with strong and weak views: A case study on wrapper induction. In: Proc. 18th Int. Joint Conference on Artificial Intelligence, pp. 415–420 (2003)

    Google Scholar 

  21. Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: Proc. 17th National Conference on Artificial Intelligence, pp. 577–583 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kiyavitskaya, N., Zeni, N., Cordy, J.R., Mich, L., Mylopoulos, J. (2005). Applying Software Analysis Technology to Lightweight Semantic Markup of Document Text. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds) Pattern Recognition and Data Mining. ICAPR 2005. Lecture Notes in Computer Science, vol 3686. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551188_65

Download citation

  • DOI: https://doi.org/10.1007/11551188_65

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28757-5

  • Online ISBN: 978-3-540-28758-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics