Effectively Representing Short Text via the Improved Semantic Feature Space Mapping

Tuo, Ting; Ma, Huifang; Liu, Haijiao; Wei, Jiahui

doi:10.1007/978-3-030-26142-9_27

Ting Tuo¹⁰,
Huifang Ma^10,11,
Haijiao Liu¹⁰ &
…
Jiahui Wei¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11607))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

785 Accesses

Abstract

Short text representation (STR) has attracted increasing interests recently with the rapid growth of Web and social media data existing in short text form. In this paper, we present a new method using an improved semantic feature space mapping to effectively represent short texts. Firstly, semantic clustering of terms is performed based on statistical analysis and word2vec, and the semantic feature space can then be represented via the cluster center. Then, the context information of terms is integrated with the semantic feature space, based on which three improved similarity calculation methods are established. Thereafter the text mapping matrix is constructed for short text representation learning. Experiments on both Chinese and English test collections show that the proposed method can well reflect the semantic information of short texts and represent the short texts reasonably and effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lu, H.Y., Xie, L.Y., Kang, N, et al.: Don’t forget the quantifiable relationship between words: using recurrent neural network for short text topic discovery. In: AAAI 2017, pp. 1192–1198 (2017)
Google Scholar
Piao, G.Y, Breslin, J.G.: User modeling on Twitter with WordNet Synsets and DBpedia concepts for personalized recommendations. In: CIKM 2016, pp. 2057–2060 (2016)
Google Scholar
Li, P., Wang, H., Zhu, K.Q., et al.: A large probabilistic semantic network based approach to compute term similarity. IEEE Trans. Knowl. Data Eng. 27(10), 2604–2617 (2015)
Article Google Scholar
Kumar, S., Rengarajan, P., Annie, A.X.: Using Wikipedia category network to generate topic trees. In: AAAI 2017, pp. 4951–4952 (2017)
Google Scholar
Shen, J., Wu, Z., Lei, D., Shang, J., Ren, X., Han, J.: SetExpan: corpus-based set expansion via context feature selection and rank ensemble. In: Ceci, M., Hollmén, J., Todorovski, L., Vens, C., Džeroski, S. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10534, pp. 288–304. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71249-9_18
Chapter Google Scholar
Wang, D.Z.: Archimedes: efficient query processing over probabilistic knowledge bases. ACM SIGMOD 46(2), 30–35 (2017)
Article Google Scholar
Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. In: NIPS 2013, pp. 3111–3119 (2013)
Google Scholar
Jiang, H.D., Turki, T., Wang J.T.L.: Reverse engineering regulatory networks in cells using a dynamic bayesian network and mutual information scoring function. In: ICMLA 2017, pp. 761–764 (2017)
Google Scholar
Amagata, D., Hara, T.: Mining top-k co-occurrence patterns across multiple streams. IEEE Trans. Knowl. Data Eng. 29(10), 2249–2262 (2017)
Article Google Scholar
Ma, H.F., Xing, Y., Wang, S., et al.: Leveraging term co-occurrence distance and strong classification features for short text feature selection. In: KSEM 2017, pp. 67–75(2017)
Chapter Google Scholar
Song, H., Lee, J.G., Han, W.S.: PAMAE: parallel k-medoids clustering with high accuracy and efficiency. In: KDD 2017, pp. 1087–1096 (2017)
Google Scholar
DBLP Dataset [EB/OL], 20 Apr 2016. http://dblp.uni-trier.de/xml/
ICTCLAS, ICTCLAS2012-SDK-0101, rar[EB/OL] (2016). http://www.nlpir.org/download/
Ali, C.M., Khalid, S., Aslam, M.H.: Pattern based comprehensive urdu stemmer and short text classification. IEEE Access 6, 7374–7389 (2018)
Article Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 61762078, No. 61363058, No. 61663004, No. kx201705) and Guangxi Key Lab of Multi-source Information Mining and Security (No. MIMS18-08).

Author information

Authors and Affiliations

College of Computer Science and Engineering, Northwest Normal University, Lanzhou, China
Ting Tuo, Huifang Ma, Haijiao Liu & Jiahui Wei
Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, China
Huifang Ma

Authors

Ting Tuo
View author publications
You can also search for this author in PubMed Google Scholar
Huifang Ma
View author publications
You can also search for this author in PubMed Google Scholar
Haijiao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jiahui Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Ting Tuo , Huifang Ma , Haijiao Liu or Jiahui Wei .

Editor information

Editors and Affiliations

University of Macau, Macao, China
Leong Hou U.
Singapore Management University, Singapore, Singapore
Hady W. Lauw

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tuo, T., Ma, H., Liu, H., Wei, J. (2019). Effectively Representing Short Text via the Improved Semantic Feature Space Mapping. In: U., L., Lauw, H. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11607. Springer, Cham. https://doi.org/10.1007/978-3-030-26142-9_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-26142-9_27
Published: 12 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26141-2
Online ISBN: 978-3-030-26142-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics