Skip to main content

Document Identification by Shallow Semantic Analysis

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2000)

Abstract

Identifying a matching component is a recurring problem in software engineering, specifically in software reuse. Properly generalized, it can be seen as an information retrieval problem. In the context of defining the architecture of a comprehensive software archive, we are designing a two-level retrieval structure. In this paper we report on the first level, a quick search facility based on analyzing texts written in natural language. Based on textual and structural properties of the documents contained in the repository, the universe is reduced to a moderately sized set of candidates to be further analyzed by more focussed mechanisms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. W.B. Frakes. Information Retrieval: Data Structures and Algorithms, pages 1–12. Prentice Hall, 1992.

    Google Scholar 

  2. P.H. Fries. Advances in Written Text Analysis, pages 229–249. Routledge, 1994.

    Google Scholar 

  3. E.J. Guglielmo and N.C. Rowe. Natural Language Retrieval of Images based on Descriptive Captions. ACM Trans. on Information Systems, 14(3), July 1996.

    Google Scholar 

  4. U. Hahn. Topic Parsing: Accounting for Text Macro Structures in Full-Text Analysis. Information Processing and Management, 26(1):135–170, 1990.

    Article  Google Scholar 

  5. M.A.K. Halliday. An Introduction to Functional Grammar. Edward Arnold, 1985.

    Google Scholar 

  6. M.A.K. Halliday and R. Hasan. Cohesion in English. Addison Wesley Ltd, 1976.

    Google Scholar 

  7. I. Jacobson, M. Griss, and P. Jonsson. Software Reuse. Addison-Wesley, 1997.

    Google Scholar 

  8. R.J. Leach. Software Reuse. McGraw Hill, 1997.

    Google Scholar 

  9. A. Mili, R. Mili, and R.T. Mittermeir. A Survey of Software Reuse Libraries. Annals of Software Engineering-Systematic Software Reuse, 5:349–414, 1998.

    Google Scholar 

  10. H. Mili, E. Akhi, R. Godin, and H. Mcheik. Another Nail to the Coffin of Faceted Controlled-Vocabulary Component Classification and Retrieval. In M.Harandi, Symposium on Software Reusability, vol. 22, pp 89–98. ACM Press, 1997.

    Article  Google Scholar 

  11. R. Mili, A. Mili, and R.T. Mittermeir. Storing and Retrieving Software Components: A Refinement Based System. IEEE Tran. on Software Engineering, 23(7):445–460, July 1997.

    Article  Google Scholar 

  12. M. Mitra, A. Singhal, and C. Buckley. Improving Automatic Query Expansion. In Proc. of the 21st Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp 206–214, Melbourne, August 24-28 1998.

    Google Scholar 

  13. R.T. Mittermeir, H. Pozewaunig, A. Mili, and R. Mili. Uncertainty Aspects in Component Retrieval. In Proc. of the 7th Int. Conf. on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Paris, July 1998.

    Google Scholar 

  14. J. Morris and G. Hirst. Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text. Ass. for Comp. Linguistics, 17(1), March 1991.

    Google Scholar 

  15. M. Nystrand. The Structure of Written Communication. Academic Press, 1986.

    Google Scholar 

  16. [16] Rubèn Prieto-Diàz. Implementing Faceted Classification for Software Reuse. Communications of the ACM, 43(5):88–97, May 1991.

    Article  Google Scholar 

  17. F. Daneš. Functional Sentence Perspective and the Organization of the Text. In Papers on Functional Sentence Perspective, pages 106–128. Publishing House of The Czechoslovak Academy of Sciences, Prague, 1970.

    Google Scholar 

  18. T.A. vanDijk. Handbook of Discourse Analysis: Dimensions of Discourse, vol. 2, pp 103–134. Academic Press, 1985.

    Google Scholar 

  19. Y. Yang and J.P. Pedersen. A Comparative Study on Feature Selection in Text Categorization. In Proc. of the 14th Int. Conf. on Machine learning, 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bouchachia, A., Mittermeir, R.T., Pozewaunig, H. (2001). Document Identification by Shallow Semantic Analysis. In: Bouzeghoub, M., Kedad, Z., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2000. Lecture Notes in Computer Science, vol 1959. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45399-7_16

Download citation

  • DOI: https://doi.org/10.1007/3-540-45399-7_16

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41943-3

  • Online ISBN: 978-3-540-45399-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics