HINMINE: heterogeneous information network mining with information retrieval heuristics

Kralj, Jan; Robnik-Šikonja, Marko; Lavrač, Nada

doi:10.1007/s10844-017-0444-9

HINMINE: heterogeneous information network mining with information retrieval heuristics

Published: 28 January 2017

Volume 50, pages 29–61, (2018)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

572 Accesses
16 Citations
Explore all metrics

Abstract

The paper presents an approach to mining heterogeneous information networks by decomposing them into homogeneous networks. The proposed HINMINE methodology is based on previous work that classifies nodes in a heterogeneous network in two steps. In the first step the heterogeneous network is decomposed into one or more homogeneous networks using different connecting nodes. We improve this step by using new methods inspired by weighting of bag-of-words vectors mostly used in information retrieval. The methods assign larger weights to nodes which are more informative and characteristic for a specific class of nodes. In the second step, the resulting homogeneous networks are used to classify data either by network propositionalization or label propagation. We propose an adaptation of the label propagation algorithm to handle imbalanced data and test several classification algorithms in propositionalization. The new methodology is tested on three data sets with different properties. For each data set, we perform a series of experiments and compare different heuristics used in the first step of the methodology. We also use different classifiers which can be used in the second step of the methodology when performing network propositionalization. Our results show that HINMINE, using different network decomposition methods, can significantly improve the performance of the resulting classifiers, and also that using a modified label propagation algorithm is beneficial when the data set is imbalanced.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Heterogeneous Network Decomposition and Weighting with Text Mining Heuristics

SocNL: Bayesian Label Propagation with Confidence

Mining Text Enriched Heterogeneous Citation Networks

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

References

Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. Journal of machine learning research, 7, 2399–2434.
MathSciNet MATH Google Scholar
Burt, R., & Minor, M. (1983). Applied Network Analysis: A Methodological Introduction: Sage Publications.
Cantador, I., Brusilovsky, P., & Kuflik, T. (2011). 2Nd workshop on information heterogeneity and fusion in recommender systems (hetrec 2011). In Proceedings of the 5th ACM conference on Recommender systems. RecSys. New York: ACM.
Consortium (2000). Gene ontology: tool for the unification of biology. The gene ontology consortium. Nature genetics, 25(1), 25–29.
de Sousa, C. A. R., Rezende, S. O., & Batista, G. E (2013). Influence of graph construction on semi-supervised learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 160–175): Springer.
Debole, F., & Sebastiani, F (2004). Supervised term weighting for automated text categorization. In Text Mining and Its Applications (pp. 81–97): Springer.
Demṡar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7(Jan), 1–30.
MathSciNet Google Scholar
D’Orazio, V., Landis, S. T., Palmer, G., & Schrodt, P. (2014). Separating the wheat from the chaff: Applications of automated document classification using support vector machines. Polytical Analysis, 22(2), 224–242.
Article Google Scholar
Grčar, M., Trdin, N., & Lavrač, N. (2013). A methodology for mining document-enriched heterogeneous information networks. The Computer Journal, 56(3), 321–335.
Article Google Scholar
Han, E.-H., & Karypis, G (2000). Centroid-based document classification: Analysis and experimental results. In Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery (pp. 424–431): Springer.
Hwang, T., & Kuang, R. (2010). A heterogeneous label propagation algorithm for disease gene discovery. In Proceedings of SIAM International Conference on Data Mining (pp. 583–594).
Jeh, G., & Widom, J (2002). SimRank: A measure of structural-context similarity. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 538–543): ACM.
Ji, M., Sun, Y., Danilevsky, M., Han, J., & Gao, J. (2010). Graph regularized transductive classification on heterogeneous information networks. In Proceedings of the 25th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (pp. 570–586).
Jones, K.S. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28, 11–21.
Article Google Scholar
Kleinberg, J.M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632.
Article MathSciNet MATH Google Scholar
Kondor, R.I., & Lafferty, J.D. (2002). Diffusion kernels on graphs and other discrete input spaces. In Proceedings of the 19th International Conference on Machine Learning (pp. 315–322).
Kralj, J., Valmarska, A., Robnik-Ṡikonja, M., & Lavraċ, N. (2015). Mining text enriched heterogeneous citation networks. In Proceedings of the 19th Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 672–683).
Kwok, J.T.-Y. (1998). Automated text categorization using support vector machine. In Proceedings of the 5th International Conference on Neural Information Processing (pp. 347–351).
Lan, M., Tan, C.L., Su, J., & Lu, Y. (2009). Supervised and traditional term weighting methods for automatic text categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4), 721–735.
Article Google Scholar
Liu, W., & Chang, S.-F (2009). Robust multi-class transductive learning with graphs. In IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009 (pp. 381–388): IEEE.
Manevitz, L.M., & Yousef, M. (2002). One-class SVMs for document classification. Journal of Machine Learning Research, 2, 139–154.
MATH Google Scholar
Martineau, J., & Finin, T. (2009). Delta TFIDF: an improved feature space for sentiment analysis. In Proceedings of the third AAAI internatonal conference on weblogs and social media. San Jose: AAAI Press.
Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Technical report: Stanford InfoLab.
Google Scholar
Robertson, S.E., & Walker, S. (1994). Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 232–241). New York: Springer.
Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., & Eliassi-Rad, T. (2008). Collective classification in network data. AI magazine, 29(3), 93.
Article Google Scholar
Storn, R., & Price, K. (1997). Differential evolution; A simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11(4), 341–359.
Article MathSciNet MATH Google Scholar
Sun, Y., & Han, J. (2012). Mining Heterogeneous Information Networks: Principles and Methodologies: Morgan & Claypool Publishers.
Sun, Y., Yu, Y., & Han, J. (2009). Ranking-based clustering of heterogeneous information networks with star network schema. In Proceedings of the 15th ACM SIGKDD I,nternational Conference on Knowledge Discovery and Data Mining (pp. 797–806).
Tan, S. (2006). An effective refinement strategy for KNN text classifier. Expert Systems with Applications, 30(2), 290–298.
Article Google Scholar
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008). Arnetminer: Extraction and mining of academic social networks. In KDD’08 (pp. 990–998).
Vanunu, O., Magger, O., Ruppin, E., Shlomi, T., & Sharan, R. (2010). Associating genes and protein complexes with disease via network propagation. PLoS Computational Biology, 6(1).
Zachary, W. (1977). An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33, 452–473.
Article Google Scholar
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., & Schölkopf, B. (2004). Learning with local and global consistency. Advances in N,eural Information Processing Systems, 16(16), 321–328.
Google Scholar
Zhu, X., Ghahramani, Z., Lafferty, J., & et al. (2003). Semi-supervised learning using gaussian fields and harmonic functions. In ICML, (Vol. 3 pp. 912–919).

Download references

Acknowledgments

This research was supported by the European Commission through the Human Brain Project (Grant number 604102) and three National Research Agency grants: the research programmes Knowledge Technologies (P2-0103), Artificial intelligence and intelligent systems (P2-0209) and project Development and applications of new semantic data mining methods in life sciences (J2-5478). Our thanks goes to Miha Grčar for previous work on this topic, which has inspired the research described in this paper.

Author information

Authors and Affiliations

Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Jan Kralj & Nada Lavrač
Jožef Stefan International Postgratuate School, Jamova 39, 1000, Ljubljana, Slovenia
Jan Kralj & Nada Lavrač
Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, 1000, Ljubljana, Slovenia
Marko Robnik-Šikonja

Authors

Jan Kralj
View author publications
You can also search for this author in PubMed Google Scholar
Marko Robnik-Šikonja
View author publications
You can also search for this author in PubMed Google Scholar
Nada Lavrač
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Kralj.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kralj, J., Robnik-Šikonja, M. & Lavrač, N. HINMINE: heterogeneous information network mining with information retrieval heuristics. J Intell Inf Syst 50, 29–61 (2018). https://doi.org/10.1007/s10844-017-0444-9

Download citation

Received: 01 July 2016
Revised: 08 November 2016
Accepted: 13 January 2017
Published: 28 January 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s10844-017-0444-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HINMINE: heterogeneous information network mining with information retrieval heuristics

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Heterogeneous Network Decomposition and Weighting with Text Mining Heuristics

SocNL: Bayesian Label Propagation with Confidence

Mining Text Enriched Heterogeneous Citation Networks

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

HINMINE: heterogeneous information network mining with information retrieval heuristics

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Heterogeneous Network Decomposition and Weighting with Text Mining Heuristics

SocNL: Bayesian Label Propagation with Confidence

Mining Text Enriched Heterogeneous Citation Networks

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation