Skip to main content
Log in

A novel classifier for multivariate instance using graph class signatures

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Applications like identifying different customers from their unique buying behaviours, determining ratings of a product given by users based on different sets of features, etc. require classification using class-specific subsets of features. Most of the existing state-of-the-art classifiers for multivariate data use complete feature set for classification regardless of the different class labels. Decision tree classifier can produce class-wise subsets of features. However, none of these classifiers model the relationship between features which may enhance classification accuracy. We call the class-specific subsets of features and the features’ interrelationships as class signatures. In this work, we propose to map the original input space of multivariate data to the feature space characterized by connected graphs as graphs can easily model entities, their attributes, and relationships among attributes. Mostly, entities are modeled using graphs, where graphs occur naturally, for example, chemical compounds. However, graphs do not occur naturally in multivariate data. Thus, extracting class signatures from multivariate data is a challenging task. We propose some feature selection heuristics to obtain class-specific prominent subgraph signatures. We also propose two variants of class signatures based classifier namely: 1) maximum matching signature (gMM), and 2) score and size of matched signatures (gSM). The effectiveness of the proposed approach on real-world and synthetic datasets has been studied and compared with other established classifiers. Experimental results confirm the ascendancy of the proposed class signatures based classifier on most of the datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Theodoridis S, Koutroumbas K. Pattern Recognition. 4th ed. USA: Academic Press, Inc., 2008

    MATH  Google Scholar 

  2. Esmeir S, Markovitch S. Lookahead-based algorithms for anytime induction of decision trees. In: Proceedings of the 21st ACM International Conference on Machine Learning. 2004, 33

  3. Tan P J, Dowe D L. MML inference of decision graphs with multi-way joins and dynamic attributes. In: Proceedings of Australasian Joint Conference on Artificial Intelligence. 2003, 269–281

  4. Zhu Y, Yu J X, Cheng H, Qin L. Graph classification: a diversified discriminative feature selection approach. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012, 205–214

  5. Ranu S, Hoang M, Singh A K. Mining discriminative subgraphs from global-state networks. In: Proceedings of the 19th ACM International Conference on Knowledge Discovery and Data Mining. 2013, 509–517

  6. Dang X H, Singh A K, Bogdanov P, You H, Hsu B. Discriminative subnetworks with regularized spectral learning for global-state network data. In: Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2014, 290–306

  7. Dang X H, You H, Bogdanov P, Singh A K. Learning predictive substructures with regularization for network data. In: Proceedings of IEEE International Conference on Data Mining. 2015, 81–90

  8. Vert J P. Kernel methods in computational biology. Bussei Kenkyu, 2004, 81(97): 35–90

    Google Scholar 

  9. Mahé P, Ueda N, Akutsu T, Perret J L. Extensions of marginalized graph kernels. In: Proceedings of the 21st ACM International Conference on Machine Learning. 2004, 70

  10. Seeland M, Karwath A, Kramer S. A structural cluster kernel for learning on graphs. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012, 516–524

  11. Riesen K, Bunke H. Graph classification by means of Lipschitz embedding. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2009, 39(6): 1472–1483

    Article  Google Scholar 

  12. Bandyopadhyay D, Huan J, Liu J, Prins J, Snoeyink J, Wang W, Tropsha A. Structure-based function inference using protein family-specific fingerprints. Protein Science, 2006, 15(6): 1537–1543

    Article  Google Scholar 

  13. Cheng H, Yan X, Han J, Hsu C W. Discriminative frequent pattern analysis for effective classification. In: Proceedings of the 23rd IEEE International Conference on Data Engineering. 2007, 716–725

  14. Deshpande M, Kuramochi M, Wale N, Karypis G. Frequent substructure-based approaches for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(8): 1036–1050

    Article  Google Scholar 

  15. Yan X, Han J. gSpan: graph-based substructure pattern mining. In: Proceedings of the IEEE International Conference on Data Mining. 2002, 721–724

  16. Inokuchi A, Washio T, Motoda H. An apriori-based algorithm for mining frequent substructures from graph data. In: Proceedings of European Conference on Principles of Data Mining and Knowledge Discovery. 2000, 13–23

  17. Kuramochi M, Karypis G. Frequent subgraph discovery. In: Proceedings of the IEEE International Conference on Data Mining. 2001, 313–320

  18. Huan J, Wang W, Prins J. Efficient mining of frequent subgraphs in the presence of isomorphism. In: Proceedings of the 3rd IEEE International Conference on Data Mining. 2003, 549–552

  19. Nijssen S, Kok J N. A quickstart in frequent structure mining can make adifference. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004, 647–652

  20. Zhou K. gBolt: very fast implementation for gSpan algorithm in data mining. GitHub, 2017

  21. Yan X, Cheng H, Han J, Yu P S. Mining significant graph patterns by leap search. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2008, 433–444

  22. Saigo H, Krämer N, Tsuda K. Partial least squares regression for graph mining. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 578–586

  23. Thoma M, Cheng H, Gretton A, Han J, Kriegel H P, Smola A, Song L, Yu P S, Yan X, Borgwardt K. Near-optimal supervised feature selection among frequent subgraphs. In: Proceedings of the SIAM International Conference on Data Mining. 2009, 1076–1087

  24. Ranu S, Singh A K. Graphsig: a scalable approach to mining significant subgraphs in large graph databases. In: Proceedings of the 25th IEEE International Conference on Data Engineering. 2009, 844–855

  25. Jin N, Young C, Wang W. Graph classification based on pattern cooccurrence. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 573–582

  26. Jin N, Young C, Wang W. GAIA: graph classification using evolutionary computation. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2010, 879–890

  27. Kong X, Yu P S. Semi-supervised feature selection for graph classification. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 793–802

  28. Kong X, Philip S Y. gMLC: a multi-label feature selection framework for graph classification. Knowledge and Information Systems, 2012, 31(2): 281–305

    Article  Google Scholar 

  29. Pan S, Zhu X, Zhang C, Philip S Y. Graph stream classification using labeled and unlabeled graphs. In: Proceedings of the 29th IEEE International Conference on Data Engineering. 2013, 398–409

  30. Kong X, Yu P S, Wang X, Ragin A B. Discriminative feature selection for uncertain graph classification. In: Proceedings of the SIAM International Conference on Data Mining. 2013, 82–93

  31. Wu J, Zhu X, Zhang C, Philip S Y. Bag constrained structure pattern mining for multi-graph classification. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(10): 2382–2396

    Article  Google Scholar 

  32. Pan S, Wu J, Zhu X, Zhang C. Joint structure feature exploration and regularization for multi-task graph classification. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(3): 715–728

    Article  Google Scholar 

  33. Pan S, Wu J, Zhu X, Long G, Zhang C. Task sensitive feature exploration and learning for multitask graph classification. IEEE Transactions on Cybernetics, 2017, 47(3): 744–758

    Article  Google Scholar 

  34. Pan S, Wu J, Zhu X, Long G, Zhang C. Finding the best not the most: regularized loss minimization subgraph selection for graph classification. Pattern Recognition, 2015, 48(11): 3783–3796

    Article  Google Scholar 

  35. Saigo H, Nowozin S, Kadowaki T, Kudo T, Tsuda K. gBoost: a mathematical programming approach to graph classification and regression. Machine Learning, 2009, 75(1): 69–89

    Article  Google Scholar 

  36. Pan S, Zhu X. Graph classification with imbalanced class distributions and noise. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence. 2013, 1586–1592

  37. Kudo T, Maeda E, Matsumoto Y. An application of boosting to graph classification. In: Proceedings of the 17th International Conference on Neural Information Processing Systems. 2004, 729–736

  38. Pan S, Wu J, Zhu X, Zhang C. Graph ensemble boosting for imbalanced noisy graph stream classification. IEEE Transactions on Cybernetics, 2015, 45(5): 954–968

    Article  Google Scholar 

  39. Pan S, Wu J, Zhu X. CogBoost: boosting for fast cost-sensitive graph classification. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(11): 2933–2946

    Article  Google Scholar 

  40. Cook D J, Holder L B. Mining Graph Data. John Wiley & Sons, 2006

  41. Witten I H, Frank E, Hall M A, Pal C J. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2016

  42. Thrun S B, Bala J W, Bloedorn E, Bratko I, Cestnik B, Cheng J, De Jong K A, Dzeroski S, Fisher D H, Fahlman S E, Hamann R. The monk’s problems: a performance comparison of different learning algorithms. Technical Report of Carnegie Mellon University, 1991

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Parnika Paranjape.

Additional information

Parnika Paranjape received the MTech degree in computer science from Visvesvaraya National Institute of Technology, India, where she is currently pursuing the PhD degree. Her research interests include data mining, graph mining, graph classification, and graph similarity search.

Meera Dhabu received the PhD degree from the Visvesvaraya National Institute of Technology, India. She is currently an assistant professor with the Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology. She has 16 years of academic experience. She has coauthored a number of research articles in various journals, conferences, and book chapters. Her research interests include data mining and machine learning. She is a member of the ACM.

Parag Deshpande received the PhD degree from Nagpur University, India, and the MTech degree from IIT Bombay, India. He is currently a professor with the Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, India. He has 30 years of academic experience. He has co-authored a number of research articles in various journals, conferences, and book chapters. He is the author of several books, including C and Data Structure, Data Warehousing Using Oracle, and Sql/plsql for Oracle 11g (Wiley). His research interests include databases, data mining, and pattern recognition. He is a member of ISTE and SAE-India. He was a recipient of prestigious awards and honors for his excellence in academics and research.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Paranjape, P., Dhabu, M. & Deshpande, P. A novel classifier for multivariate instance using graph class signatures. Front. Comput. Sci. 14, 144307 (2020). https://doi.org/10.1007/s11704-019-8263-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-019-8263-5

Keywords

Navigation