Abstract
A large amount of work has been done on syntactic analysis of English texts. But, for analyzing the short phrases without any structured contexts like capitalization, subject-object-verb order, etc. these techniques are not yet proved to be appropriate. In this paper we have attempted the syntactic analysis of the phrases where contextual information is not available. We have developed stemmer, POS tagger, chunker and Named Entity tagger for English short phrases like chats, messages, and queries, using root dictionary and language specific rules. We have evaluated the technique on English queries and observed that our system outperforms some commonly used NLP tools.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Folk, M.J., Zoellick, B.: File structures. Addison-Wesley, Reading (1992)
Knuth, D.E.: The art of computer programming, 3rd edn. Sorting and Searching, vol. iii. Addison & Wesley, Reading (1998)
Morrison, D.R.: Patricia-practical algorithm to retrieve information coded in alphanumeric. Journal of the ACM (JACM) 15(4), 514–534 (1968)
Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, vol. 13, pp. 63–70. Association for Computational Linguistics (2000)
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370. Association for Computational Linguistics (2005)
Marcus, M., Santorini, B., Marcinkiewicz, M., Taylor, A.: Treebank–3 (tech. rep.). Linguistic Data Consortium, Philadelphia (1999)
Socher, R., Bauer, J., Manning, C.D., Ng, A.Y.: Parsing with compositional vector grammars. In: Proceedings of the 51st Annual Meeting on Association for Computational Linguistics. Citeseer (2013)
Petrov, S., Das, D., McDonald, R.: A universal part-of-speech tagset. arXiv preprint arXiv:1104.2086 (2011)
Martins, A.F., Almeida, M., Smith, N.A.: Turning on the turbo: Fast third-order non-projective turbo parsers. In: ACL (2), pp. 617–622 (2013)
Martins, A.F., Das, D., Smith, N.A., Xing, E.P.: Stacking dependency parsers. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 157–166. Association for Computational Linguistics (2008)
Mohit, B., Schneider, N., Bhowmick, R., Oflazer, K., Smith, N.A.: Recall-oriented learning of named entities in arabic wikipedia. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 162–173. Association for Computational Linguistics (2012)
Porter, M.F.: An algorithm for suffix stripping. Program: Electronic Library and Information Systems 14(3), 130–137 (1980)
Frakes, W.B.: Stemming algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms, pp. 131–160. Prentice Hall, Englewood Cliffs (1992)
Hafer, M.A., Weiss, S.F.: Word segmentation by letter successor varieties. Information Storage and Retrieval 10(11), 371–385 (1974)
Mitra, M., Buckley, C., Singhal, A., Cardie, C., et al.: An analysis of statistical and syntactic phrases. In: RIAO, vol. 97, pp. 200–214 (1997)
Gey, F.C., Chen, A.: Phrase discovery for english and cross-language retrieval at trec 6. NIST SPECIAL PUBLICATION SP, 637–648 (1998)
Strzalkowski, T., Lin, F., Perez-Carballo, J., Wang, J.: Natural language information retrieval trec-6 report. In: TREC, pp. 347–366. Citeseer (1997)
De Lima, E.F., Pedersen, J.O.: Phrase recognition and expansion for short, precision-biased queries based on a query log. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 145–152. ACM (1999)
Hoffart, J., Altun, Y., Weikum, G.: Discovering emerging entities with ambiguous names. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 385–396. International World Wide Web Conferences Steering Committee (2014)
Sedgewick, R.: Algorithms in Java, Parts 1-4. Addison-Wesley Professional (2002)
Hatcher, E., Gospodnetic, O., McCandless, M.: Lucene in action (2004)
Solanki, K., Sarkar, A., Manjunath, B.S.: YASS: Yet another steganographic scheme that resists blind steganalysis. In: Furon, T., Cayre, F., Doërr, G., Bas, P. (eds.) IH 2007. LNCS, vol. 4567, pp. 16–31. Springer, Heidelberg (2008)
Roth, D., Zelenko, D.: Part of speech tagging using a network of linear separators. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, vol. 2, pp. 1136–1142. Association for Computational Linguistics (1998)
Li, X., Roth, D.: Exploring evidence for shallow parsing. In: Proceedings of the 2001 Workshop on Computational Natural Language Learning, vol. 7, p. 6. Association for Computational Linguistics (2001)
Li, X., Morie, P., Roth, D.: Robust reading: Identification and tracing of ambiguous names. Technical report, DTIC Document (2004)
Van Rijsbergen, C.: An algorithm for information structuring and retrieval. The Computer Journal 14(4), 407–412 (1971)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Chatterji, S., Sreedhara, G.S., Desarkar, M.S. (2014). An Efficient Tool for Syntactic Processing of English Query Text. In: Prasath, R., O’Reilly, P., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. Lecture Notes in Computer Science(), vol 8891. Springer, Cham. https://doi.org/10.1007/978-3-319-13817-6_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-13817-6_27
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13816-9
Online ISBN: 978-3-319-13817-6
eBook Packages: Computer ScienceComputer Science (R0)