Abstract
Identifying a matching component is a recurring problem in software engineering, specifically in software reuse. Properly generalized, it can be seen as an information retrieval problem. In the context of defining the architecture of a comprehensive software archive, we are designing a two-level retrieval structure. In this paper we report on the first level, a quick search facility based on analyzing texts written in natural language. Based on textual and structural properties of the documents contained in the repository, the universe is reduced to a moderately sized set of candidates to be further analyzed by more focussed mechanisms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
W.B. Frakes. Information Retrieval: Data Structures and Algorithms, pages 1–12. Prentice Hall, 1992.
P.H. Fries. Advances in Written Text Analysis, pages 229–249. Routledge, 1994.
E.J. Guglielmo and N.C. Rowe. Natural Language Retrieval of Images based on Descriptive Captions. ACM Trans. on Information Systems, 14(3), July 1996.
U. Hahn. Topic Parsing: Accounting for Text Macro Structures in Full-Text Analysis. Information Processing and Management, 26(1):135–170, 1990.
M.A.K. Halliday. An Introduction to Functional Grammar. Edward Arnold, 1985.
M.A.K. Halliday and R. Hasan. Cohesion in English. Addison Wesley Ltd, 1976.
I. Jacobson, M. Griss, and P. Jonsson. Software Reuse. Addison-Wesley, 1997.
R.J. Leach. Software Reuse. McGraw Hill, 1997.
A. Mili, R. Mili, and R.T. Mittermeir. A Survey of Software Reuse Libraries. Annals of Software Engineering-Systematic Software Reuse, 5:349–414, 1998.
H. Mili, E. Akhi, R. Godin, and H. Mcheik. Another Nail to the Coffin of Faceted Controlled-Vocabulary Component Classification and Retrieval. In M.Harandi, Symposium on Software Reusability, vol. 22, pp 89–98. ACM Press, 1997.
R. Mili, A. Mili, and R.T. Mittermeir. Storing and Retrieving Software Components: A Refinement Based System. IEEE Tran. on Software Engineering, 23(7):445–460, July 1997.
M. Mitra, A. Singhal, and C. Buckley. Improving Automatic Query Expansion. In Proc. of the 21st Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp 206–214, Melbourne, August 24-28 1998.
R.T. Mittermeir, H. Pozewaunig, A. Mili, and R. Mili. Uncertainty Aspects in Component Retrieval. In Proc. of the 7th Int. Conf. on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Paris, July 1998.
J. Morris and G. Hirst. Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text. Ass. for Comp. Linguistics, 17(1), March 1991.
M. Nystrand. The Structure of Written Communication. Academic Press, 1986.
[16] Rubèn Prieto-Diàz. Implementing Faceted Classification for Software Reuse. Communications of the ACM, 43(5):88–97, May 1991.
F. Daneš. Functional Sentence Perspective and the Organization of the Text. In Papers on Functional Sentence Perspective, pages 106–128. Publishing House of The Czechoslovak Academy of Sciences, Prague, 1970.
T.A. vanDijk. Handbook of Discourse Analysis: Dimensions of Discourse, vol. 2, pp 103–134. Academic Press, 1985.
Y. Yang and J.P. Pedersen. A Comparative Study on Feature Selection in Text Categorization. In Proc. of the 14th Int. Conf. on Machine learning, 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bouchachia, A., Mittermeir, R.T., Pozewaunig, H. (2001). Document Identification by Shallow Semantic Analysis. In: Bouzeghoub, M., Kedad, Z., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2000. Lecture Notes in Computer Science, vol 1959. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45399-7_16
Download citation
DOI: https://doi.org/10.1007/3-540-45399-7_16
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41943-3
Online ISBN: 978-3-540-45399-4
eBook Packages: Springer Book Archive