Skip to main content

Learning Heterogeneous Coupling Relationships Between Non-IID Terms

  • Conference paper
  • First Online:
  • 547 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8316))

Abstract

With the rapid proliferation of social media and online community, a vast amount of text data has been generated. Discovering the insightful value of the text data has increased its importance, a variety of text mining and process algorithms have been created in the recent years such as classification, clustering, similarity comparison. Most previous research uses a vector-space model for text representation and analysis. However, the vector-space model does not utilise the information about the relationships between the term to term. Moreover, the classic classification methods also ignore the relationships between each text document to another. In other word, the traditional text mining techniques assume the relation between terms and between documents are independent and identically distributed (iid). In this paper, we will introduce a novel term representation by involving the coupled relations from term to term. This coupled representation provides much richer information that enables us to create a coupled similarity metric for measuring document similarity, and a coupled document similarity based K-Nearest centroid classifier will be applied to the classification task. Experiments verify the proposed approach outperforming the classic vector-space based classifier, and show potential advantages and richness in exploring the other text mining tasks.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Hinton, G.E., Sejnowski, T.J.: Learning and Relearning in Boltzmann Machines, vol. 1, pp. 282–317. MIT Press, Cambridge (1986)

    Google Scholar 

  2. Pollack, J.B.: Recursive distributed representations. Artif. Intell. 46(1), 77–105 (1990)

    Article  Google Scholar 

  3. Elman, J.L.: Distributed representations, simple recurrent networks, and grammatical structure. Mach. Learn. 7(2–3), 195–225 (1991)

    Google Scholar 

  4. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)

    Article  Google Scholar 

  5. Cao, L., Gorodetsky, V., Mitkas, P.: Agent mining: the synergy of agents and data mining. IEEE Intell. Syst. 24(3), 64–72 (2009)

    Article  Google Scholar 

  6. Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1–2), 177–196 (2001)

    Article  MATH  Google Scholar 

  7. Bellegarda, J.R.: Statistical language model adaptation: review and perspectives. Speech Commun. 42(1), 93–108 (2004)

    Article  Google Scholar 

  8. Bengio, Y., Schwenk, H., Senécal, J.-S., Morin, F., Gauvain, J.-L.: Neural probabilistic language models. In: Holmes, D.E., Jain, L.C. (eds.) Innovations in Machine Learning. STUDFUZZ, vol. 194, pp. 137–186. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Mikolov, T., Karafiát, M., Burget, L., Cernock\(\grave{\rm y}\), J., Khudanpur, S.: Recurrent neural network based language model. In: Proceedings of INTERSPEECH, pp. 1045–1048 (2010)

    Google Scholar 

  10. Mikolov, T., Kombrink, S., Burget, L., Cernocky, J.H., Khudanpur, S.: Extensions of recurrent neural network language model. In: Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5528–5531 (2011)

    Google Scholar 

  11. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  12. Hinton, G., Salakhutdinov, R.: Discovering binary codes for documents by learning deep generative models. Top. Cogn. Sci. 3(1), 74–91 (2011)

    Article  Google Scholar 

  13. Bordes, A., Glorot, X., Weston, J., Bengio, Y.: Joint learning of words and meaning representations for open-text semantic parsing. In: Proceedings of International Conference on Artificial Intelligence and Statistics, pp. 127–135 (2012)

    Google Scholar 

  14. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)

    MATH  Google Scholar 

  15. Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394 (2010)

    Google Scholar 

  16. Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167 (2008)

    Google Scholar 

  17. Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Proceedings of Advances in Neural Information Processing Systems, pp. 1081–1088 (2008)

    Google Scholar 

  18. Cao, L.: Non-iidness learning in behavioral and social data. Comput. J. (2013). doi:10.1093/comjnl/bxt084

  19. Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pp. 873–882 (2012)

    Google Scholar 

  20. Cao, L., Philip, S.Y. (eds.): Behavior Computing: Modeling, Analysis, Mining and Decision. Springer, Heidelberg (2012)

    Google Scholar 

  21. Wang, C., She, Z., Cao, L.: Coupled attribute analysis on numerical data. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence, pp. accepted (2013)

    Google Scholar 

  22. Wang, C., She, Z., Cao, L.: Coupled clustering ensemble: incorporating coupling relationships both between base clusterings and objects. In: Proceedings of the 29th IEEE International Conference on Data Engineering, pp. accepted (2013)

    Google Scholar 

  23. Cheng, X., Miao, D., Wang, C., Cao, L.: Coupled term-term relation analysis for document clustering. In: Proceedings of IJCNN 2013 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mu Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, M. et al. (2014). Learning Heterogeneous Coupling Relationships Between Non-IID Terms. In: Cao, L., Zeng, Y., Symeonidis, A., Gorodetsky, V., Müller, J., Yu, P. (eds) Agents and Data Mining Interaction. ADMI 2013. Lecture Notes in Computer Science(), vol 8316. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55192-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-55192-5_7

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-55191-8

  • Online ISBN: 978-3-642-55192-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics