Skip to main content

An Efficient Tool for Syntactic Processing of English Query Text

  • Conference paper
Mining Intelligence and Knowledge Exploration

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8891))

  • 1608 Accesses

Abstract

A large amount of work has been done on syntactic analysis of English texts. But, for analyzing the short phrases without any structured contexts like capitalization, subject-object-verb order, etc. these techniques are not yet proved to be appropriate. In this paper we have attempted the syntactic analysis of the phrases where contextual information is not available. We have developed stemmer, POS tagger, chunker and Named Entity tagger for English short phrases like chats, messages, and queries, using root dictionary and language specific rules. We have evaluated the technique on English queries and observed that our system outperforms some commonly used NLP tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Folk, M.J., Zoellick, B.: File structures. Addison-Wesley, Reading (1992)

    MATH  Google Scholar 

  2. Knuth, D.E.: The art of computer programming, 3rd edn. Sorting and Searching, vol. iii. Addison & Wesley, Reading (1998)

    Google Scholar 

  3. Morrison, D.R.: Patricia-practical algorithm to retrieve information coded in alphanumeric. Journal of the ACM (JACM) 15(4), 514–534 (1968)

    Article  Google Scholar 

  4. Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, vol. 13, pp. 63–70. Association for Computational Linguistics (2000)

    Google Scholar 

  5. Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370. Association for Computational Linguistics (2005)

    Google Scholar 

  6. Marcus, M., Santorini, B., Marcinkiewicz, M., Taylor, A.: Treebank–3 (tech. rep.). Linguistic Data Consortium, Philadelphia (1999)

    Google Scholar 

  7. Socher, R., Bauer, J., Manning, C.D., Ng, A.Y.: Parsing with compositional vector grammars. In: Proceedings of the 51st Annual Meeting on Association for Computational Linguistics. Citeseer (2013)

    Google Scholar 

  8. Petrov, S., Das, D., McDonald, R.: A universal part-of-speech tagset. arXiv preprint arXiv:1104.2086 (2011)

    Google Scholar 

  9. Martins, A.F., Almeida, M., Smith, N.A.: Turning on the turbo: Fast third-order non-projective turbo parsers. In: ACL (2), pp. 617–622 (2013)

    Google Scholar 

  10. Martins, A.F., Das, D., Smith, N.A., Xing, E.P.: Stacking dependency parsers. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 157–166. Association for Computational Linguistics (2008)

    Google Scholar 

  11. Mohit, B., Schneider, N., Bhowmick, R., Oflazer, K., Smith, N.A.: Recall-oriented learning of named entities in arabic wikipedia. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 162–173. Association for Computational Linguistics (2012)

    Google Scholar 

  12. Porter, M.F.: An algorithm for suffix stripping. Program: Electronic Library and Information Systems 14(3), 130–137 (1980)

    Article  Google Scholar 

  13. Frakes, W.B.: Stemming algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms, pp. 131–160. Prentice Hall, Englewood Cliffs (1992)

    Google Scholar 

  14. Hafer, M.A., Weiss, S.F.: Word segmentation by letter successor varieties. Information Storage and Retrieval 10(11), 371–385 (1974)

    Article  Google Scholar 

  15. Mitra, M., Buckley, C., Singhal, A., Cardie, C., et al.: An analysis of statistical and syntactic phrases. In: RIAO, vol. 97, pp. 200–214 (1997)

    Google Scholar 

  16. Gey, F.C., Chen, A.: Phrase discovery for english and cross-language retrieval at trec 6. NIST SPECIAL PUBLICATION SP, 637–648 (1998)

    Google Scholar 

  17. Strzalkowski, T., Lin, F., Perez-Carballo, J., Wang, J.: Natural language information retrieval trec-6 report. In: TREC, pp. 347–366. Citeseer (1997)

    Google Scholar 

  18. De Lima, E.F., Pedersen, J.O.: Phrase recognition and expansion for short, precision-biased queries based on a query log. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 145–152. ACM (1999)

    Google Scholar 

  19. Hoffart, J., Altun, Y., Weikum, G.: Discovering emerging entities with ambiguous names. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 385–396. International World Wide Web Conferences Steering Committee (2014)

    Google Scholar 

  20. Sedgewick, R.: Algorithms in Java, Parts 1-4. Addison-Wesley Professional (2002)

    Google Scholar 

  21. Hatcher, E., Gospodnetic, O., McCandless, M.: Lucene in action (2004)

    Google Scholar 

  22. Solanki, K., Sarkar, A., Manjunath, B.S.: YASS: Yet another steganographic scheme that resists blind steganalysis. In: Furon, T., Cayre, F., Doërr, G., Bas, P. (eds.) IH 2007. LNCS, vol. 4567, pp. 16–31. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  23. Roth, D., Zelenko, D.: Part of speech tagging using a network of linear separators. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, vol. 2, pp. 1136–1142. Association for Computational Linguistics (1998)

    Google Scholar 

  24. Li, X., Roth, D.: Exploring evidence for shallow parsing. In: Proceedings of the 2001 Workshop on Computational Natural Language Learning, vol. 7, p. 6. Association for Computational Linguistics (2001)

    Google Scholar 

  25. Li, X., Morie, P., Roth, D.: Robust reading: Identification and tracing of ambiguous names. Technical report, DTIC Document (2004)

    Google Scholar 

  26. Van Rijsbergen, C.: An algorithm for information structuring and retrieval. The Computer Journal 14(4), 407–412 (1971)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Chatterji, S., Sreedhara, G.S., Desarkar, M.S. (2014). An Efficient Tool for Syntactic Processing of English Query Text. In: Prasath, R., O’Reilly, P., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. Lecture Notes in Computer Science(), vol 8891. Springer, Cham. https://doi.org/10.1007/978-3-319-13817-6_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13817-6_27

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13816-9

  • Online ISBN: 978-3-319-13817-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics