Loading [a11y]/accessibility-menu.js
Understanding Short Texts through Semantic Enrichment and Hashing | IEEE Journals & Magazine | IEEE Xplore

Understanding Short Texts through Semantic Enrichment and Hashing


Abstract:

Clustering short texts (such as news titles) by their meaning is a challenging task. The semantic hashing approach encodes the meaning of a text into a compact binary cod...Show More

Abstract:

Clustering short texts (such as news titles) by their meaning is a challenging task. The semantic hashing approach encodes the meaning of a text into a compact binary code. Thus, to tell if two texts have similar meanings, we only need to check if they have similar codes. The encoding is created by a deep neural network, which is trained on texts represented by word-count vectors (bag-of-word representation). Unfortunately, for short texts such as search queries, tweets, or news titles, such representations are insufficient to capture the underlying semantics. To cluster short texts by their meanings, we propose to add more semantic signals to short texts. Specifically, for each term in a short text, we obtain its concepts and co-occurring terms from a probabilistic knowledge base to enrich the short text. Furthermore, we introduce a simplified deep learning network consisting of a 3-layer stacked auto-encoders for semantic hashing. Comprehensive experiments show that, with more semantic signals, our simplified deep learning model is able to capture the semantics of short texts, which enables a variety of applications including short text retrieval, classification, and general purpose text processing.
Published in: IEEE Transactions on Knowledge and Data Engineering ( Volume: 28, Issue: 2, 01 February 2016)
Page(s): 566 - 579
Date of Publication: 01 October 2015

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.