Learning Heterogeneous Coupling Relationships Between Non-IID Terms

Li, Mu; Li, Jinjiu; Ou, Yuming; Zhang, Ya; Luo, Dan; Bahtia, Maninder; Cao, Longbing

doi:10.1007/978-3-642-55192-5_7

Learning Heterogeneous Coupling Relationships Between Non-IID Terms

Mu Li¹⁰,
Jinjiu Li¹⁰,
Yuming Ou¹⁰,
Ya Zhang¹¹,
Dan Luo¹⁰,
Maninder Bahtia¹² &
…
Longbing Cao¹⁰

Conference paper
First Online: 01 January 2014

547 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8316))

Abstract

With the rapid proliferation of social media and online community, a vast amount of text data has been generated. Discovering the insightful value of the text data has increased its importance, a variety of text mining and process algorithms have been created in the recent years such as classification, clustering, similarity comparison. Most previous research uses a vector-space model for text representation and analysis. However, the vector-space model does not utilise the information about the relationships between the term to term. Moreover, the classic classification methods also ignore the relationships between each text document to another. In other word, the traditional text mining techniques assume the relation between terms and between documents are independent and identically distributed (iid). In this paper, we will introduce a novel term representation by involving the coupled relations from term to term. This coupled representation provides much richer information that enables us to create a coupled similarity metric for measuring document similarity, and a coupled document similarity based K-Nearest centroid classifier will be applied to the classification task. Experiments verify the proposed approach outperforming the classic vector-space based classifier, and show potential advantages and richness in exploring the other text mining tasks.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Hinton, G.E., Sejnowski, T.J.: Learning and Relearning in Boltzmann Machines, vol. 1, pp. 282–317. MIT Press, Cambridge (1986)
Google Scholar
Pollack, J.B.: Recursive distributed representations. Artif. Intell. 46(1), 77–105 (1990)
Article Google Scholar
Elman, J.L.: Distributed representations, simple recurrent networks, and grammatical structure. Mach. Learn. 7(2–3), 195–225 (1991)
Google Scholar
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)
Article Google Scholar
Cao, L., Gorodetsky, V., Mitkas, P.: Agent mining: the synergy of agents and data mining. IEEE Intell. Syst. 24(3), 64–72 (2009)
Article Google Scholar
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1–2), 177–196 (2001)
Article MATH Google Scholar
Bellegarda, J.R.: Statistical language model adaptation: review and perspectives. Speech Commun. 42(1), 93–108 (2004)
Article Google Scholar
Bengio, Y., Schwenk, H., Senécal, J.-S., Morin, F., Gauvain, J.-L.: Neural probabilistic language models. In: Holmes, D.E., Jain, L.C. (eds.) Innovations in Machine Learning. STUDFUZZ, vol. 194, pp. 137–186. Springer, Heidelberg (2006)
Chapter Google Scholar
Mikolov, T., Karafiát, M., Burget, L., Cernock\(\grave{\rm y}\), J., Khudanpur, S.: Recurrent neural network based language model. In: Proceedings of INTERSPEECH, pp. 1045–1048 (2010)
Google Scholar
Mikolov, T., Kombrink, S., Burget, L., Cernocky, J.H., Khudanpur, S.: Extensions of recurrent neural network language model. In: Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5528–5531 (2011)
Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MATH MathSciNet Google Scholar
Hinton, G., Salakhutdinov, R.: Discovering binary codes for documents by learning deep generative models. Top. Cogn. Sci. 3(1), 74–91 (2011)
Article Google Scholar
Bordes, A., Glorot, X., Weston, J., Bengio, Y.: Joint learning of words and meaning representations for open-text semantic parsing. In: Proceedings of International Conference on Artificial Intelligence and Statistics, pp. 127–135 (2012)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
MATH Google Scholar
Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394 (2010)
Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167 (2008)
Google Scholar
Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Proceedings of Advances in Neural Information Processing Systems, pp. 1081–1088 (2008)
Google Scholar
Cao, L.: Non-iidness learning in behavioral and social data. Comput. J. (2013). doi:10.1093/comjnl/bxt084
Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pp. 873–882 (2012)
Google Scholar
Cao, L., Philip, S.Y. (eds.): Behavior Computing: Modeling, Analysis, Mining and Decision. Springer, Heidelberg (2012)
Google Scholar
Wang, C., She, Z., Cao, L.: Coupled attribute analysis on numerical data. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence, pp. accepted (2013)
Google Scholar
Wang, C., She, Z., Cao, L.: Coupled clustering ensemble: incorporating coupling relationships both between base clusterings and objects. In: Proceedings of the 29th IEEE International Conference on Data Engineering, pp. accepted (2013)
Google Scholar
Cheng, X., Miao, D., Wang, C., Cao, L.: Coupled term-term relation analysis for document clustering. In: Proceedings of IJCNN 2013 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Technology, Sydney, Australia
Mu Li, Jinjiu Li, Yuming Ou, Dan Luo & Longbing Cao
Shanghai Jiaotong University, Xuhui, China
Ya Zhang
Australian Taxation Office, Adelaide, Australia
Maninder Bahtia

Authors

Mu Li
View author publications
You can also search for this author in PubMed Google Scholar
Jinjiu Li
View author publications
You can also search for this author in PubMed Google Scholar
Yuming Ou
View author publications
You can also search for this author in PubMed Google Scholar
Ya Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dan Luo
View author publications
You can also search for this author in PubMed Google Scholar
Maninder Bahtia
View author publications
You can also search for this author in PubMed Google Scholar
Longbing Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mu Li .

Editor information

Editors and Affiliations

University of Technology Sydney, Sydney, New South Wales, Australia
Longbing Cao
Teesside University, Middlesbrough, United Kingdom
Yifeng Zeng
Aristotle University of Thessaloniki, Thessaloniki, Greece
Andreas L. Symeonidis
St. Petersburg Institute for Informatics, St. Petersburg, Russia
Vladimir Gorodetsky
Department of Informatics, Technische Universität Clausthal, Clausthal-Zellerfeld, Germany
Jörg P. Müller
University of Illinois Chicago, Chicago, Illinois, USA
Philip S. Yu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, M. et al. (2014). Learning Heterogeneous Coupling Relationships Between Non-IID Terms. In: Cao, L., Zeng, Y., Symeonidis, A., Gorodetsky, V., Müller, J., Yu, P. (eds) Agents and Data Mining Interaction. ADMI 2013. Lecture Notes in Computer Science(), vol 8316. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55192-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-55192-5_7
Published: 01 May 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-55191-8
Online ISBN: 978-3-642-55192-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics