Skip to main content

Locality-Sensitive Term Weighting for Short Text Clustering

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10634))

Included in the following conference series:

Abstract

To alleviate sparseness in short text clustering, considerable researches investigate external information such as Wikipedia to enrich feature representation, which requires extra works and resources and might lead to possible inconsistency. Sparseness leads to weak connections between short texts, thus the similarity information is difficult to be measured. We introduce a special term-specific document set—potential locality set—to capture weak similarity. Specifically, for any two short documents within the same potential locality, the Jaccard similarity between them is greater than 0. In other words, the adjacency graph based on these weak connections is a complete graph. Further, a locality-sensitive term weighting scheme is proposed based on our potential locality set. Experimental results show the proposed approach builds more reliable neighborhood for short text data. Compared with another state-of-the-art algorithm, the proposed approach obtains better clustering performances, which verifies its effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    83% text-based recommender systems in the domain of digital libraries, see https://en.wikipedia.org/wiki/Tf-idf.

  2. 2.

    idf has a probabilistic explanation of the odds that t occurs in d.

  3. 3.

    http://jwebpro.sourceforge.net/data-web-snippets.tar.gz.

  4. 4.

    https://github.com/jacoxu/StackOverflow.

  5. 5.

    https://github.com/xiaohuiyan/BTM.

References

  1. Jin, O., Liu, N.N., Zhao, K., Yu, Y., Yang, Q.: Transferring topical knowledge from auxiliary long texts for short text clustering. In: 20th International Conference on Information and Knowledge Management, pp. 775–784. ACM, Glasgow, Scotland, UK (2011)

    Google Scholar 

  2. Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: 15th International Conference on World Wide Web, pp. 377–386. ACM, Edinburgh, Scotland (2006)

    Google Scholar 

  3. Phan, X.H., Nguyen, C.T., Le, D.T., Nguyen, L.M., Horiguchi, S., Ha, Q.T.: A hidden topic-based framework toward building applications with short web documents. Trans. KDE 23(7), 961–976 (2011)

    Google Scholar 

  4. Xu, J., Xu, B., Wang, P., Zheng, S., Tian, G., Zhao, J.: Self-taught convolutional neural networks for short text clustering. J. Neural Netw. 88, 22–32 (2017)

    Article  Google Scholar 

  5. Wang, P., Xu, B., Xu, J., Tian, G., Liu, C.L., Hao, H.: Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. J. Neurocomput. 174, 806–814 (2016)

    Article  Google Scholar 

  6. Chen, M., Jin, X., Shen, D.: Short text classification improved by learning multi-granularity topics. In: International Joint Conference on Artificial Intelligence, pp. 1776–1781 (2011)

    Google Scholar 

  7. Wang, Z., Mi, H., Ittycheriah, A.: Semi-supervised clustering for short text via deep representation learning. In: 20th Conference on Computational Natural Language Learning, pp. 31–39, Berlin, Germany (2016)

    Google Scholar 

  8. Luo, H., Tang, Y.Y., Li, C., Yang, L.: Local and global geometric structure preserving and application to hyperspectral image classification. J. Math. Prob. Eng. 2015, 13 p (2015)

    Google Scholar 

  9. Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in Neural Information Processing Systems, pp. 585–591 (2002)

    Google Scholar 

  10. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  11. Xing, E.P., Jordan, M.I., Russell, S.J., Ng, A.Y.: Distance metric learning with application to clustering with side-information. In: Advances in Neural Information Processing Systems, pp. 521–528 (2003)

    Google Scholar 

  12. Finegan, C., Coke, R., Zhang, R., Ye, X., Radev, D.: Effects of creativity and cluster tightness on short text clustering performance. In: 54th Annual Meeting of the Association for Computational Linguistics, pp. 654–665, Berlin, Germany (2016)

    Google Scholar 

  13. Xu, J., Peng, W., Guanhua, T., Bo, X., Jun, Z., Fangyuan, W., Hongwei, H.: Short text clustering via convolutional neural networks. In: NAACL-HLT, pp. 62–69, Denver, Colorado (2015)

    Google Scholar 

  14. Yan, X., Guo, J., Lan, Y., Cheng, X.: A Biterm topic model for short texts. In: 22nd International Conference on World Wide Web, pp. 1445–1456 (2013)

    Google Scholar 

Download references

Acknowledgement

The work described in this paper was partially supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China [Project No. CityU 11300715], and a grant from City University of Hong Kong [Project No. 7004674].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hau-San Wong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zheng, CT., Qian, S., Cao, WM., Wong, HS. (2017). Locality-Sensitive Term Weighting for Short Text Clustering. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10634. Springer, Cham. https://doi.org/10.1007/978-3-319-70087-8_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-70087-8_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-70086-1

  • Online ISBN: 978-3-319-70087-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics