Abstract
This article describes the rationale behind the PHASAR system (Phrase-based Accurate Search And Retrieval), a professional Information Retrieval and Text Mining system under development for the collection of information about metabolites from the biological literature. The system is generic in nature and applicable (given suitable linguistic resources and thesauri) to many other forms of professional search. Instead of keywords, the PHASAR search engine uses Dependency Triples as terms. Both the documents and the queries are parsed, transduced to Dependency Triples and lemmatized. Queries consist of a set of Dependency Triples, whose elements may be generalized or specialized in order to achieve the desired precision and recall. In order to help in interactive exploration, the search process is supported by document frequency information from the index, both for terms from the query and for terms from the thesaurus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arampatzis, A., van der Weide, T.P., Koster, C.H.A., van Bommel, P.: An Evaluation of Linguistically-motivated Indexing Schemes. In: Arampatzis, A. (ed.) Proceedings of BCS-IRSG, 22nd Annual Colloquium on IR Research, pp. 34–45 (2000)
Bouma, G., Mur, J., van Noord, G., van der Plas, L., Tiedemann, J.: Question Answering for Dutch using Dependency Relations. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 370–379. Springer, Heidelberg (2006)
Bruza, P., Huibers, T.W.C.: A Study of Aboutness in Information Retrieval. Artificial Intelligence Review 10, 1–27 (1996)
Cui, H., Sun, R., Li, K., Kan, M.-Y., Chua, T.-S.: Question Answering Passage Retrieval Using Dependency Relations. In: Proceedings SIGIR (2005)
Fagan, J.L.: Experiments in automatic phrase indexing for document retrieval: a comparison of syntactic and non-syntactic methods, PhD Thesis, Cornell University (1988)
Furnkranz, J., Mitchell, T., Riloff, E.: Case Study in Using Linguistic Phrases for Text Categorization on the WWW, AAAI/ICML Workshop on Learning for Text Categorization (1998)
Grootjen, F.A., van der Weide, T.P.: Effectiveness of Index Expressions. In: Meziane, F., Métais, E. (eds.) NLDB 2004. LNCS, vol. 3136, pp. 171–181. Springer, Heidelberg (2004)
Hekkelman, M.L., Vriend, G.: MRS: A fast and compact retrieval system for biological data. Nucleic Acids Res. (July 1, 2005), 33(Web Server issue), W766W769, Also: http://mrs.cmbi.ru.nl/
Koster, C.H.A., Verbruggen, E.: The AGFL Grammar Work Lab. In: Proceedings FREENIX/Usenix 2002, pp. 13–18 (2002)
Melc̆uk, I.A.: Dependency Syntax: Theory and Practice. State University of New York Press, Albany (1988)
Riloff, E., Lorenzen, J.: Extraction-based Text Categorization: Generating Domain-specific Role Relationships Automatically. In: [Strzalkowski 1999] (1999)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Sparck Jones, K.: The role of NLP in Text Retrieval (1999). In: [Strzalkowski, 1999], pp. 1-24 (1999)
Strzalkowski, T.: Natural Language Information Retrieval. Information Processing and Management 31(3), 397–417 (1995)
Strzalkowski, T.: Natural Language Information Retrieval. Kluwer Academic Publishers, Dordrecht (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Koster, C.H.A., Seibert, O., Seutter, M. (2006). The PHASAR Search Engine. In: Kop, C., Fliedl, G., Mayr, H.C., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2006. Lecture Notes in Computer Science, vol 3999. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11765448_13
Download citation
DOI: https://doi.org/10.1007/11765448_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34616-6
Online ISBN: 978-3-540-34617-3
eBook Packages: Computer ScienceComputer Science (R0)