Skip to main content

A Document Modeling Method Based on Deep Generative Model and Spectral Hashing

  • Conference paper
  • First Online:
  • 1645 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9983))

Abstract

One of the most critical challenges in document modeling is the efficiency of the extraction of the high level representations. In this paper, a document modeling method based on deep generative model and spectral hashing is proposed. Firstly, dense and low-dimensional features are well learned from a deep generative model with word-count vectors as its input. And then, these features are used for training a spectral hashing model to compress a novel document into compact binary code, and the Hamming distances between these codewords correlate with semantic similarity. Taken together, retrieving similar neighbors is then done simply by retrieving all items with codewords within a small Hamming distance of the codewords for the query, which can be exceedingly fast and shows superior performance compared with conventional methods as well as guarantees accessibility to the large-scale dataset.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)

    Article  Google Scholar 

  2. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)

    Article  Google Scholar 

  3. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM, New York (1999)

    Google Scholar 

  4. David, M.B., Andrew, Y.N., Michael, I.J.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  5. Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1711–1800 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  6. Hinton, G.E., Osindero, S.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  7. Xu, J., Li, H., Zhou, S.: An overview of deep generative models. IETE Techn. Rev. 32(2), 131–139 (2015)

    Article  Google Scholar 

  8. Li, J., Luong, M.T., Dan, J.: A hierarchical neural autoencoder for paragraphs and documents. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp. 1106–1115. Association for Computational Linguistics, Stroudsburg (2015)

    Google Scholar 

  9. Le, Q.V., Tomas, M.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning, pp. 1188–1196 (2014)

    Google Scholar 

  10. Salakhutdinov, R.R., Hinton, G.E.: Semantic hashing. Int. J. Approximate Reasoning 50(7), 969–978 (2009)

    Article  Google Scholar 

  11. Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: Advances in Neural Information Processing Systems, vol. 21, pp. 1753–1760 (2009)

    Google Scholar 

  12. Yu, G., Sapiro, G., Mallat, S.: Solving inverse problems with piecewise linear estimators: from Gaussian mixture models to structured sparsity. IEEE Trans. Image Process. 21(5), 2481–2499 (2012)

    Article  MathSciNet  Google Scholar 

  13. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 888–905 (1997)

    Google Scholar 

  14. Kannan, R., Vempala, S., Vetta, A.: On clusterings-good, bad and spectral. J. ACM 51(3), 497–515 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  15. Andrew, Y.N., Michael, I.J., Yair, W.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems, vol. 14, pp. 849–856 (2002)

    Google Scholar 

  16. Xu, J., Li, H., Zhou, S.: Improving mixing rate with tempered transition for learning restricted Boltzmann machines. Neurocomputing 139, 328–335 (2014)

    Article  Google Scholar 

  17. Bekkerman, R., Yaniv, R.E., Tishby, N., Winter, Y.: On feature distributional clustering for text categorization. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 146–153. ACM, New York (2001)

    Google Scholar 

  18. Li, B., Vogel, C.: Improving multiclass text classification with error-correcting output coding and sub-class partitions. Adv. Artif. Intell. 6085, 4–15 (2010)

    Google Scholar 

  19. Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2–3), 103–134 (2000)

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This work is supported in part by the Beijing Natural Science Foundation under Grant No. 4162067/4142050 and the National Science Foundation of China under Grant No. 61472391/61372171.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ben He .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Chen, H., Xu, J., Wang, Q., He, B. (2016). A Document Modeling Method Based on Deep Generative Model and Spectral Hashing. In: Lehner, F., Fteimi, N. (eds) Knowledge Science, Engineering and Management. KSEM 2016. Lecture Notes in Computer Science(), vol 9983. Springer, Cham. https://doi.org/10.1007/978-3-319-47650-6_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47650-6_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47649-0

  • Online ISBN: 978-3-319-47650-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics