Large Scale Sentiment Analysis with Locality Sensitive BitHash

Zhang, Wenhao; Ji, Jianqiu; Zhu, Jun; Xu, Hua; Zhang, Bo

doi:10.1007/978-3-319-28940-3_3

Wenhao Zhang¹⁹,
Jianqiu Ji¹⁹,
Jun Zhu¹⁹,
Hua Xu¹⁹ &
…
Bo Zhang¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9460))

Included in the following conference series:

AIRS

824 Accesses

Abstract

As social media data rapidly grows, sentiment analysis plays an increasingly more important role in classifying users’ opinions, attitudes and feelings expressed in text. However, most studies have been focused on the effectiveness of sentiment analysis, while ignoring the storage efficiency when processing large-scale high-dimensional text data. In this paper, we incorporate the machine learning based sentiment analysis with our proposed Locality Sensitive One-Bit Min-Hash (BitHash) method. BitHash compresses each data sample into a compact binary hash code while preserving the pairwise similarity of the original data. The binary code can be used as a compressed and informative representation in replacement of the original data for subsequent processing, for example, it can be naturally integrated with a classifier like SVM. By using the compact hash code, the storage space is significantly reduced. Experiment on the popular open benchmark dataset shows that, as the hash code length increases, the classification accuracy of our proposed method could approach the state-of-the-art method, while our method only requires a significantly smaller storage space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Broder, A.: On the resemblance and containment of documents. In: Proceedings of the Compression and Complexity of Sequences 1997, p. 21 (1997)
Google Scholar
Charikar, M.: Similarity estimation techniques from rounding algorithms. In: ACM Symposium on Theory of Computing, pp. 380–388 (2002)
Google Scholar
Hsu, C.W., Chang, C.C., Lin, C.J., et al.: A practical guide to support vector classification (2003)
Google Scholar
Li, P., Shrivastava, A., Moore, J.L., König, A.C.: Hashing algorithms for large-scale learning. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2011)
Google Scholar
Liu, B.: Sentiment analysis and subjectivity. Handb. Nat. Lang. Process. 2, 627–666 (2010)
Google Scholar
Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland (2011). http://www.aclweb.org/anthology/P11-1015
Mesnil, G., Ranzato, M., Mikolov, T., Bengio, Y.: Ensemble of generative and discriminative techniques for sentiment analysis of movie reviews (2014). arXiv preprint arXiv:1412.5335
Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: INTERSPEECH, pp. 1045–1048 (2010)
Google Scholar
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2), 1–135 (2008)
Article Google Scholar
Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter, A.: Pegasos: primal estimated sub-gradient solver for SVM. Math. Program. 127(1), 3–30 (2011)
Article MathSciNet MATH Google Scholar
Wang, J., Kumar, S., Chang, S.F.: Semi-supervised hashing for large-scale search. IEEE Trans. Pattern Anal. Mach. Intell. 34(12), 2393–2406 (2012)
Article Google Scholar
Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, vol. 2, pp. 90–94. Association for Computational Linguistics (2012)
Google Scholar
Zhu, J., Xing, E.P.: Conditional topic random fields. In: Proceedings of the 27th International Conference on Machine Learning (2010)
Google Scholar

Download references

Acknowledgments

The work was supported by the National Basic Research Program (973 Program) of China (No. 2013CB329403), National Natural Science Foundation of China (Nos. 61322308, 61332007), and the Tsinghua National Laboratory for Information Science and Technology Big Data Initiative.

Author information

Authors and Affiliations

State Key Lab of Intelligent Technology and Systems, Tsinghua National TNLIST Lab, Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Wenhao Zhang, Jianqiu Ji, Jun Zhu, Hua Xu & Bo Zhang

Authors

Wenhao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jianqiu Ji
View author publications
You can also search for this author in PubMed Google Scholar
Jun Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Hua Xu
View author publications
You can also search for this author in PubMed Google Scholar
Bo Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenhao Zhang .

Editor information

Editors and Affiliations

Science and Engineering Faculty, Queensland University of Technology, Brisbane, Australia
Guido Zuccon
Brisbane, Queensland, Australia
Shlomo Geva
University of Tsukuba, Ibaraki, Japan
Hideo Joho
RMIT University, Melbourne, Australia
Falk Scholer
School of Computer Engineering, Nanyang Technological University, Singapore, Singapore
Aixin Sun
Tianjin University, China
Peng Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, W., Ji, J., Zhu, J., Xu, H., Zhang, B. (2015). Large Scale Sentiment Analysis with Locality Sensitive BitHash. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds) Information Retrieval Technology. AIRS 2015. Lecture Notes in Computer Science(), vol 9460. Springer, Cham. https://doi.org/10.1007/978-3-319-28940-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-28940-3_3
Published: 22 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28939-7
Online ISBN: 978-3-319-28940-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics