Skip to main content

Effectively Representing Short Text via the Improved Semantic Feature Space Mapping

  • Conference paper
  • First Online:
Trends and Applications in Knowledge Discovery and Data Mining (PAKDD 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11607))

Included in the following conference series:

  • 785 Accesses

Abstract

Short text representation (STR) has attracted increasing interests recently with the rapid growth of Web and social media data existing in short text form. In this paper, we present a new method using an improved semantic feature space mapping to effectively represent short texts. Firstly, semantic clustering of terms is performed based on statistical analysis and word2vec, and the semantic feature space can then be represented via the cluster center. Then, the context information of terms is integrated with the semantic feature space, based on which three improved similarity calculation methods are established. Thereafter the text mapping matrix is constructed for short text representation learning. Experiments on both Chinese and English test collections show that the proposed method can well reflect the semantic information of short texts and represent the short texts reasonably and effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lu, H.Y., Xie, L.Y., Kang, N, et al.: Don’t forget the quantifiable relationship between words: using recurrent neural network for short text topic discovery. In: AAAI 2017, pp. 1192–1198 (2017)

    Google Scholar 

  2. Piao, G.Y, Breslin, J.G.: User modeling on Twitter with WordNet Synsets and DBpedia concepts for personalized recommendations. In: CIKM 2016, pp. 2057–2060 (2016)

    Google Scholar 

  3. Li, P., Wang, H., Zhu, K.Q., et al.: A large probabilistic semantic network based approach to compute term similarity. IEEE Trans. Knowl. Data Eng. 27(10), 2604–2617 (2015)

    Article  Google Scholar 

  4. Kumar, S., Rengarajan, P., Annie, A.X.: Using Wikipedia category network to generate topic trees. In: AAAI 2017, pp. 4951–4952 (2017)

    Google Scholar 

  5. Shen, J., Wu, Z., Lei, D., Shang, J., Ren, X., Han, J.: SetExpan: corpus-based set expansion via context feature selection and rank ensemble. In: Ceci, M., Hollmén, J., Todorovski, L., Vens, C., Džeroski, S. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10534, pp. 288–304. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71249-9_18

    Chapter  Google Scholar 

  6. Wang, D.Z.: Archimedes: efficient query processing over probabilistic knowledge bases. ACM SIGMOD 46(2), 30–35 (2017)

    Article  Google Scholar 

  7. Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. In: NIPS 2013, pp. 3111–3119 (2013)

    Google Scholar 

  8. Jiang, H.D., Turki, T., Wang J.T.L.: Reverse engineering regulatory networks in cells using a dynamic bayesian network and mutual information scoring function. In: ICMLA 2017, pp. 761–764 (2017)

    Google Scholar 

  9. Amagata, D., Hara, T.: Mining top-k co-occurrence patterns across multiple streams. IEEE Trans. Knowl. Data Eng. 29(10), 2249–2262 (2017)

    Article  Google Scholar 

  10. Ma, H.F., Xing, Y., Wang, S., et al.: Leveraging term co-occurrence distance and strong classification features for short text feature selection. In: KSEM 2017, pp. 67–75(2017)

    Chapter  Google Scholar 

  11. Song, H., Lee, J.G., Han, W.S.: PAMAE: parallel k-medoids clustering with high accuracy and efficiency. In: KDD 2017, pp. 1087–1096 (2017)

    Google Scholar 

  12. DBLP Dataset [EB/OL], 20 Apr 2016. http://dblp.uni-trier.de/xml/

  13. ICTCLAS, ICTCLAS2012-SDK-0101, rar[EB/OL] (2016). http://www.nlpir.org/download/

  14. Ali, C.M., Khalid, S., Aslam, M.H.: Pattern based comprehensive urdu stemmer and short text classification. IEEE Access 6, 7374–7389 (2018)

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 61762078, No. 61363058, No. 61663004, No. kx201705) and Guangxi Key Lab of Multi-source Information Mining and Security (No. MIMS18-08).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ting Tuo , Huifang Ma , Haijiao Liu or Jiahui Wei .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tuo, T., Ma, H., Liu, H., Wei, J. (2019). Effectively Representing Short Text via the Improved Semantic Feature Space Mapping. In: U., L., Lauw, H. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11607. Springer, Cham. https://doi.org/10.1007/978-3-030-26142-9_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26142-9_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26141-2

  • Online ISBN: 978-3-030-26142-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics