Skip to main content

Exploring Multidimensional Continuous Feature Space to Extract Relevant Words

  • Conference paper
  • First Online:
Book cover Statistical Language and Speech Processing (SLSP 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8791))

Included in the following conference series:

Abstract

With growing amounts of text data the descriptive metadata become more crucial in efficient processing of it. One kind of such metadata are keywords, which we can encounter e.g. in everyday browsing of webpages. Such metadata can be of benefit in various scenarios, such as web search or content-based recommendation. We research keyword extraction problem from the perspective of vector space and present a novel method to extract relevant words from an article, where we represent each word and phrase of the article as a vector of its latent features. We evaluate our method within text categorisation problem using a well-known 20-newsgroups dataset and achieve state-of-the-art results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Available online at - https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing (last accessed on May 14, 2014).

  2. 2.

    Available online at - http://www.washingtonpost.com/world/national-security/cybersecurity-poll-americans-divided-over-government-requirements-on-companies/2012/06/06/gJQAmWqnJV_story.html (last accessed on May 14, 2014).

  3. 3.

    Available online at - http://storage.googleapis.com/books/ngrams/books/datasetsv2.html (last accessed on May 14, 2014).

References

  1. Barla, M., Bieliková, M.: On deriving tagsonomies: keyword relations coming from crowd. In: Nguyen, N.T., Kowalczyk, R., Chen, S.-M. (eds.) ICCCI 2009. LNCS, vol. 5796, pp. 309–320. Springer, Heidelberg (2009)

    Google Scholar 

  2. Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167. ACM (2008)

    Google Scholar 

  3. Fara, D.G., Russell, G.: The Routledge Companion to Philosophy of Language, p. 92. Routledge, New York (2013). ISBN: 978-0-203-20696-6

    Google Scholar 

  4. Giesbrecht, E.: In search of semantic compositionality in vector spaces. In: Rudolph, S., Dau, F., Kuznetsov, S.O. (eds.) ICCS 2009. LNCS, vol. 5662, pp. 173–184. Springer, Heidelberg (2009)

    Google Scholar 

  5. Harris, Z.S.: Distributional structure. Word 10(23), 146–162 (1954)

    Google Scholar 

  6. Hinton, G.E., McClelland, J.L., Rumelhart, D.E.: Distributed representations. In: Rumelhart, D.E., McClelland, J.L. (eds.) Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1: Foundations, pp. 77–109. MIT Press, Cambridge (1986)

    Google Scholar 

  7. Kramár, T., Barla, M., Bieliková, M.: Personalizing search using socially enhanced interest model, built from the stream of user’s activity. J. Web Eng. 12(1–2), 65–92 (2013)

    Google Scholar 

  8. Lan, M., Tan, C., Low, H.: Proposing a new term weighting scheme for text categorization. In: Proceedings of the 21st National Conference on Artificial Intelligence, vol. 1, pp. 763–768. AAAI Press (2008)

    Google Scholar 

  9. Larochelle, H., Bengio, Y.: Classification using discriminative restricted Boltzmann machines. In: Proceedings of the 25th International Conference on Machine Learning, pp. 536–543. ACM (2008)

    Google Scholar 

  10. Li, B., Vogel, C.: Improving multiclass text classification with error-correcting output coding and sub-class partitions. In: Farzindar, A., Kešelj, V. (eds.) Canadian AI 2010. LNCS, vol. 6085, pp. 4–15. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  11. Van der Maaten, L.J.P., Hinton, G.E.: Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    MATH  Google Scholar 

  12. Martinský, L., Návrat, P.: Query formulation improved by suggestions resulting from intermediate web search results. Comput. Inf. Syst. J. 16(1), 56–73 (2012)

    Google Scholar 

  13. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26, pp. 3111–3119. Curran Associates (2013)

    Google Scholar 

  14. Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of NAACL HLT, pp. 746–751. ACL (2013)

    Google Scholar 

  15. Mitchell, J., Lapata, M.: Vector-based models of semantic composition. In: Proceedings of the 46th Annual Meeting of the ACL, pp. 236–244. ACL (2008)

    Google Scholar 

  16. Bauer, J., Socher, R., Manning, C.D., Ng, A.Y.: Parsing with compositional vector grammars. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 455–465. ACL (2013)

    Google Scholar 

  17. Šajgalík, M., Barla, M., Bieliková, M.: From ambiguous words to key-concept extraction. In: Proceedings of the 24th International Workshop on Database and Expert Systems Applications, pp. 63–67. IEEE (2013)

    Google Scholar 

  18. Vu, T., Aw, A.T., Zhang, M.: Term extraction through unithood and termhood unification. In: Proceedings of the Third International Joint Conference on NLP, pp. 631–636. ACL (2004)

    Google Scholar 

  19. Wang, D., Zhang, H.: Inverse-category-frequency based supervised term weighting scheme for text categorization (2010). arXiv preprint arXiv:1012.2609

    Google Scholar 

Download references

Acknowledgement

This work was partially supported by grants No. VG1/0675/11, APVV-0208-10 and it is the partial result of the Research and Development Operational Programme project “University Science Park of STU Bratislava”, ITMS 26240220084, co-funded by the European Regional Development Fund.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Márius Šajgalík .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Šajgalík, M., Barla, M., Bieliková, M. (2014). Exploring Multidimensional Continuous Feature Space to Extract Relevant Words. In: Besacier, L., Dediu, AH., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2014. Lecture Notes in Computer Science(), vol 8791. Springer, Cham. https://doi.org/10.1007/978-3-319-11397-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11397-5_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11396-8

  • Online ISBN: 978-3-319-11397-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics