An Efficient Tool for Syntactic Processing of English Query Text

Chatterji, Sanjay; Sreedhara, G. S.; Desarkar, Maunendra Sankar

doi:10.1007/978-3-319-13817-6_27

Sanjay Chatterji²¹,
G. S. Sreedhara²¹ &
Maunendra Sankar Desarkar²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8891))

1608 Accesses

Abstract

A large amount of work has been done on syntactic analysis of English texts. But, for analyzing the short phrases without any structured contexts like capitalization, subject-object-verb order, etc. these techniques are not yet proved to be appropriate. In this paper we have attempted the syntactic analysis of the phrases where contextual information is not available. We have developed stemmer, POS tagger, chunker and Named Entity tagger for English short phrases like chats, messages, and queries, using root dictionary and language specific rules. We have evaluated the technique on English queries and observed that our system outperforms some commonly used NLP tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Folk, M.J., Zoellick, B.: File structures. Addison-Wesley, Reading (1992)
MATH Google Scholar
Knuth, D.E.: The art of computer programming, 3rd edn. Sorting and Searching, vol. iii. Addison & Wesley, Reading (1998)
Google Scholar
Morrison, D.R.: Patricia-practical algorithm to retrieve information coded in alphanumeric. Journal of the ACM (JACM) 15(4), 514–534 (1968)
Article Google Scholar
Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, vol. 13, pp. 63–70. Association for Computational Linguistics (2000)
Google Scholar
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370. Association for Computational Linguistics (2005)
Google Scholar
Marcus, M., Santorini, B., Marcinkiewicz, M., Taylor, A.: Treebank–3 (tech. rep.). Linguistic Data Consortium, Philadelphia (1999)
Google Scholar
Socher, R., Bauer, J., Manning, C.D., Ng, A.Y.: Parsing with compositional vector grammars. In: Proceedings of the 51st Annual Meeting on Association for Computational Linguistics. Citeseer (2013)
Google Scholar
Petrov, S., Das, D., McDonald, R.: A universal part-of-speech tagset. arXiv preprint arXiv:1104.2086 (2011)
Google Scholar
Martins, A.F., Almeida, M., Smith, N.A.: Turning on the turbo: Fast third-order non-projective turbo parsers. In: ACL (2), pp. 617–622 (2013)
Google Scholar
Martins, A.F., Das, D., Smith, N.A., Xing, E.P.: Stacking dependency parsers. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 157–166. Association for Computational Linguistics (2008)
Google Scholar
Mohit, B., Schneider, N., Bhowmick, R., Oflazer, K., Smith, N.A.: Recall-oriented learning of named entities in arabic wikipedia. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 162–173. Association for Computational Linguistics (2012)
Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program: Electronic Library and Information Systems 14(3), 130–137 (1980)
Article Google Scholar
Frakes, W.B.: Stemming algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms, pp. 131–160. Prentice Hall, Englewood Cliffs (1992)
Google Scholar
Hafer, M.A., Weiss, S.F.: Word segmentation by letter successor varieties. Information Storage and Retrieval 10(11), 371–385 (1974)
Article Google Scholar
Mitra, M., Buckley, C., Singhal, A., Cardie, C., et al.: An analysis of statistical and syntactic phrases. In: RIAO, vol. 97, pp. 200–214 (1997)
Google Scholar
Gey, F.C., Chen, A.: Phrase discovery for english and cross-language retrieval at trec 6. NIST SPECIAL PUBLICATION SP, 637–648 (1998)
Google Scholar
Strzalkowski, T., Lin, F., Perez-Carballo, J., Wang, J.: Natural language information retrieval trec-6 report. In: TREC, pp. 347–366. Citeseer (1997)
Google Scholar
De Lima, E.F., Pedersen, J.O.: Phrase recognition and expansion for short, precision-biased queries based on a query log. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 145–152. ACM (1999)
Google Scholar
Hoffart, J., Altun, Y., Weikum, G.: Discovering emerging entities with ambiguous names. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 385–396. International World Wide Web Conferences Steering Committee (2014)
Google Scholar
Sedgewick, R.: Algorithms in Java, Parts 1-4. Addison-Wesley Professional (2002)
Google Scholar
Hatcher, E., Gospodnetic, O., McCandless, M.: Lucene in action (2004)
Google Scholar
Solanki, K., Sarkar, A., Manjunath, B.S.: YASS: Yet another steganographic scheme that resists blind steganalysis. In: Furon, T., Cayre, F., Doërr, G., Bas, P. (eds.) IH 2007. LNCS, vol. 4567, pp. 16–31. Springer, Heidelberg (2008)
Chapter Google Scholar
Roth, D., Zelenko, D.: Part of speech tagging using a network of linear separators. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, vol. 2, pp. 1136–1142. Association for Computational Linguistics (1998)
Google Scholar
Li, X., Roth, D.: Exploring evidence for shallow parsing. In: Proceedings of the 2001 Workshop on Computational Natural Language Learning, vol. 7, p. 6. Association for Computational Linguistics (2001)
Google Scholar
Li, X., Morie, P., Roth, D.: Robust reading: Identification and tracing of ambiguous names. Technical report, DTIC Document (2004)
Google Scholar
Van Rijsbergen, C.: An algorithm for information structuring and retrieval. The Computer Journal 14(4), 407–412 (1971)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Samsung R&D Institute India, Bangalore, India
Sanjay Chatterji, G. S. Sreedhara & Maunendra Sankar Desarkar

Authors

Sanjay Chatterji
View author publications
You can also search for this author in PubMed Google Scholar
G. S. Sreedhara
View author publications
You can also search for this author in PubMed Google Scholar
Maunendra Sankar Desarkar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University College Cork, 011927, Cork, Ireland
Rajendra Prasath & Philip O’Reilly &
V.H.N.Senthikumara Nadar College, 626 001, Tamil Nadu, India
T. Kathirvalavakumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chatterji, S., Sreedhara, G.S., Desarkar, M.S. (2014). An Efficient Tool for Syntactic Processing of English Query Text. In: Prasath, R., O’Reilly, P., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. Lecture Notes in Computer Science(), vol 8891. Springer, Cham. https://doi.org/10.1007/978-3-319-13817-6_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-13817-6_27
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13816-9
Online ISBN: 978-3-319-13817-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics