ABSTRACT
This paper describes the analysis and processing programs for a set of natural language texts in a medical area (x-ray reports on patients with breast cancer). The programs convert the information in the text into a tabular form suitable for further automatic information processing (e.g., editing of records, question answering on the data collected, or statistical summaries of the data). To set up a tabular form appropriate for the data, we first perform a manual linguistic analysis on a sample of the texts. From this we obtain the word classes and the form of the table (called an information format) for this type of material. We then apply the series of processing programs to the sentences of the texts. Each sentence is parsed with the Linguistic String Parser English grammar in order to obtain its grammatical structure; certain standard English transformations are then applied to regularize the grammatical form of the sentence; and finally a set of "formatting transformations" map the words of the sentence into the slots of the format or table, in such a way that the sentence is reconstructible (up to paraphrase) from its representation in the table. The results of applying these programs to a corpus are described. This procedure enables us to convert a natural language corpus into a structured data base.
- Simmons, R., S. Klein and K. McConlogue, "Indexing and Dependency Logic for Answering English Questions," American Documentation 15, p. 196, 1964.Google ScholarCross Ref
- Harris, Z. S., "Linguistic Transformations for Information Retrieval," Proc. Int'l. Conf. on Scientific Information (1958) 2, p. 158, 1959.Google Scholar
- Sager, N., J. Touger, Z. S. Harris, J. Hamann, and B. Bookchin, "An Application of Syntactic Analysis to Information Retrieval," String Program Reports No. 6, Linguistic String Project, New York University, 1970.Google Scholar
- Sager, N., "Syntactic Formatting of Scientific Information," Proceedings of the 1972 Fall Joint Computer Conference, AFIPS Conference Proceedings, Vol. 41, pp. 791--800, AFIPS Press, Montvale, N.J., 1972. Google ScholarDigital Library
- Sager, N., "The Sublanguage Technique in Science Information Processing," Journal of the American Society for Information Science, Vol. 26, pp. 10--16, 1975.Google ScholarCross Ref
- Sager, N., "Syntactic Analysis of Natural Language," Advances in Computers, Vol. 8, pp. 153--188, Academic Press, Inc., New York, 1967.Google ScholarCross Ref
- Grishman, R., N. Sager, C. Raze, and B. Bookchin, "The Linguistic String Parser," Proceedings of the 1973 Computer Conference, pp. 427--434, AFIPS Press, 1973. Google ScholarDigital Library
- Anderson, B., I. D. J. Bross and N. Sager, "Grammatical Compression in Notes and Records: Analysis and Computation," paper delivered at the 13th Annual Meeting of the Association of Computational Linguistics, Boston, Nov. 1, 1975, American Journal of Computational Linguistics, Vol. 2, No. 4, 1975.Google Scholar
- Hirschman, L., R. Grishman and N. Sager, "Grammatically-based Automatic Word Class Formation," Information Processing and Management, Vol. 11, pp. 39--57, 1975.Google Scholar
- Hobbs, J. and R. Grishman, "The Automatic Transformational Analysis of English Sentences: An Implementation," to appear in International Journal of Computer Mathematics.Google Scholar
- Sager, N. and R. Grishman, "The Restriction Language for Computer Grammars of Natural Language," Communications of the ACM, Vol. 18, pp. 390--400, 1975. Google ScholarDigital Library
- From text to structured information: automatic processing of medical reports
Recommendations
A context-free markup language for semi-structured text
PLDI '10An ad hoc data format is any nonstandard, semi-structured data format for which robust data processing tools are not easily available. In this paper, we present ANNE, a new kind of markup language designed to help users generate documentation and data ...
A context-free markup language for semi-structured text
PLDI '10: Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and ImplementationAn ad hoc data format is any nonstandard, semi-structured data format for which robust data processing tools are not easily available. In this paper, we present ANNE, a new kind of markup language designed to help users generate documentation and data ...
Comments