Abstract
Applications like identifying different customers from their unique buying behaviours, determining ratings of a product given by users based on different sets of features, etc. require classification using class-specific subsets of features. Most of the existing state-of-the-art classifiers for multivariate data use complete feature set for classification regardless of the different class labels. Decision tree classifier can produce class-wise subsets of features. However, none of these classifiers model the relationship between features which may enhance classification accuracy. We call the class-specific subsets of features and the features’ interrelationships as class signatures. In this work, we propose to map the original input space of multivariate data to the feature space characterized by connected graphs as graphs can easily model entities, their attributes, and relationships among attributes. Mostly, entities are modeled using graphs, where graphs occur naturally, for example, chemical compounds. However, graphs do not occur naturally in multivariate data. Thus, extracting class signatures from multivariate data is a challenging task. We propose some feature selection heuristics to obtain class-specific prominent subgraph signatures. We also propose two variants of class signatures based classifier namely: 1) maximum matching signature (gMM), and 2) score and size of matched signatures (gSM). The effectiveness of the proposed approach on real-world and synthetic datasets has been studied and compared with other established classifiers. Experimental results confirm the ascendancy of the proposed class signatures based classifier on most of the datasets.
Similar content being viewed by others
References
Theodoridis S, Koutroumbas K. Pattern Recognition. 4th ed. USA: Academic Press, Inc., 2008
Esmeir S, Markovitch S. Lookahead-based algorithms for anytime induction of decision trees. In: Proceedings of the 21st ACM International Conference on Machine Learning. 2004, 33
Tan P J, Dowe D L. MML inference of decision graphs with multi-way joins and dynamic attributes. In: Proceedings of Australasian Joint Conference on Artificial Intelligence. 2003, 269–281
Zhu Y, Yu J X, Cheng H, Qin L. Graph classification: a diversified discriminative feature selection approach. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012, 205–214
Ranu S, Hoang M, Singh A K. Mining discriminative subgraphs from global-state networks. In: Proceedings of the 19th ACM International Conference on Knowledge Discovery and Data Mining. 2013, 509–517
Dang X H, Singh A K, Bogdanov P, You H, Hsu B. Discriminative subnetworks with regularized spectral learning for global-state network data. In: Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2014, 290–306
Dang X H, You H, Bogdanov P, Singh A K. Learning predictive substructures with regularization for network data. In: Proceedings of IEEE International Conference on Data Mining. 2015, 81–90
Vert J P. Kernel methods in computational biology. Bussei Kenkyu, 2004, 81(97): 35–90
Mahé P, Ueda N, Akutsu T, Perret J L. Extensions of marginalized graph kernels. In: Proceedings of the 21st ACM International Conference on Machine Learning. 2004, 70
Seeland M, Karwath A, Kramer S. A structural cluster kernel for learning on graphs. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012, 516–524
Riesen K, Bunke H. Graph classification by means of Lipschitz embedding. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2009, 39(6): 1472–1483
Bandyopadhyay D, Huan J, Liu J, Prins J, Snoeyink J, Wang W, Tropsha A. Structure-based function inference using protein family-specific fingerprints. Protein Science, 2006, 15(6): 1537–1543
Cheng H, Yan X, Han J, Hsu C W. Discriminative frequent pattern analysis for effective classification. In: Proceedings of the 23rd IEEE International Conference on Data Engineering. 2007, 716–725
Deshpande M, Kuramochi M, Wale N, Karypis G. Frequent substructure-based approaches for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(8): 1036–1050
Yan X, Han J. gSpan: graph-based substructure pattern mining. In: Proceedings of the IEEE International Conference on Data Mining. 2002, 721–724
Inokuchi A, Washio T, Motoda H. An apriori-based algorithm for mining frequent substructures from graph data. In: Proceedings of European Conference on Principles of Data Mining and Knowledge Discovery. 2000, 13–23
Kuramochi M, Karypis G. Frequent subgraph discovery. In: Proceedings of the IEEE International Conference on Data Mining. 2001, 313–320
Huan J, Wang W, Prins J. Efficient mining of frequent subgraphs in the presence of isomorphism. In: Proceedings of the 3rd IEEE International Conference on Data Mining. 2003, 549–552
Nijssen S, Kok J N. A quickstart in frequent structure mining can make adifference. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004, 647–652
Zhou K. gBolt: very fast implementation for gSpan algorithm in data mining. GitHub, 2017
Yan X, Cheng H, Han J, Yu P S. Mining significant graph patterns by leap search. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2008, 433–444
Saigo H, Krämer N, Tsuda K. Partial least squares regression for graph mining. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 578–586
Thoma M, Cheng H, Gretton A, Han J, Kriegel H P, Smola A, Song L, Yu P S, Yan X, Borgwardt K. Near-optimal supervised feature selection among frequent subgraphs. In: Proceedings of the SIAM International Conference on Data Mining. 2009, 1076–1087
Ranu S, Singh A K. Graphsig: a scalable approach to mining significant subgraphs in large graph databases. In: Proceedings of the 25th IEEE International Conference on Data Engineering. 2009, 844–855
Jin N, Young C, Wang W. Graph classification based on pattern cooccurrence. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 573–582
Jin N, Young C, Wang W. GAIA: graph classification using evolutionary computation. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2010, 879–890
Kong X, Yu P S. Semi-supervised feature selection for graph classification. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 793–802
Kong X, Philip S Y. gMLC: a multi-label feature selection framework for graph classification. Knowledge and Information Systems, 2012, 31(2): 281–305
Pan S, Zhu X, Zhang C, Philip S Y. Graph stream classification using labeled and unlabeled graphs. In: Proceedings of the 29th IEEE International Conference on Data Engineering. 2013, 398–409
Kong X, Yu P S, Wang X, Ragin A B. Discriminative feature selection for uncertain graph classification. In: Proceedings of the SIAM International Conference on Data Mining. 2013, 82–93
Wu J, Zhu X, Zhang C, Philip S Y. Bag constrained structure pattern mining for multi-graph classification. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(10): 2382–2396
Pan S, Wu J, Zhu X, Zhang C. Joint structure feature exploration and regularization for multi-task graph classification. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(3): 715–728
Pan S, Wu J, Zhu X, Long G, Zhang C. Task sensitive feature exploration and learning for multitask graph classification. IEEE Transactions on Cybernetics, 2017, 47(3): 744–758
Pan S, Wu J, Zhu X, Long G, Zhang C. Finding the best not the most: regularized loss minimization subgraph selection for graph classification. Pattern Recognition, 2015, 48(11): 3783–3796
Saigo H, Nowozin S, Kadowaki T, Kudo T, Tsuda K. gBoost: a mathematical programming approach to graph classification and regression. Machine Learning, 2009, 75(1): 69–89
Pan S, Zhu X. Graph classification with imbalanced class distributions and noise. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence. 2013, 1586–1592
Kudo T, Maeda E, Matsumoto Y. An application of boosting to graph classification. In: Proceedings of the 17th International Conference on Neural Information Processing Systems. 2004, 729–736
Pan S, Wu J, Zhu X, Zhang C. Graph ensemble boosting for imbalanced noisy graph stream classification. IEEE Transactions on Cybernetics, 2015, 45(5): 954–968
Pan S, Wu J, Zhu X. CogBoost: boosting for fast cost-sensitive graph classification. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(11): 2933–2946
Cook D J, Holder L B. Mining Graph Data. John Wiley & Sons, 2006
Witten I H, Frank E, Hall M A, Pal C J. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2016
Thrun S B, Bala J W, Bloedorn E, Bratko I, Cestnik B, Cheng J, De Jong K A, Dzeroski S, Fisher D H, Fahlman S E, Hamann R. The monk’s problems: a performance comparison of different learning algorithms. Technical Report of Carnegie Mellon University, 1991
Author information
Authors and Affiliations
Corresponding author
Additional information
Parnika Paranjape received the MTech degree in computer science from Visvesvaraya National Institute of Technology, India, where she is currently pursuing the PhD degree. Her research interests include data mining, graph mining, graph classification, and graph similarity search.
Meera Dhabu received the PhD degree from the Visvesvaraya National Institute of Technology, India. She is currently an assistant professor with the Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology. She has 16 years of academic experience. She has coauthored a number of research articles in various journals, conferences, and book chapters. Her research interests include data mining and machine learning. She is a member of the ACM.
Parag Deshpande received the PhD degree from Nagpur University, India, and the MTech degree from IIT Bombay, India. He is currently a professor with the Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, India. He has 30 years of academic experience. He has co-authored a number of research articles in various journals, conferences, and book chapters. He is the author of several books, including C and Data Structure, Data Warehousing Using Oracle, and Sql/plsql for Oracle 11g (Wiley). His research interests include databases, data mining, and pattern recognition. He is a member of ISTE and SAE-India. He was a recipient of prestigious awards and honors for his excellence in academics and research.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Paranjape, P., Dhabu, M. & Deshpande, P. A novel classifier for multivariate instance using graph class signatures. Front. Comput. Sci. 14, 144307 (2020). https://doi.org/10.1007/s11704-019-8263-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-019-8263-5