Abstract
In this paper, we have developed an automated multi-label linking scheme for PubMed citations with gene ontology (GO) terms, which enables users to have easy access to relevant publications according to various biomedical ontological terms (in particular, GO terms). We propose a maximum margin approach derived from ranking support vector machine (Rank-SVM), called SCRank-SVM. In this scheme, we remove the term bias “b” and recast the decision boundary and the separating margin to improve the margin of Rank-SVM. Due to the weaker optimization constraints, SCRank-SVM has better generalization performance and lower computational complexity. Experiments on our lung cancer data set and 6 diverse multi-label data sets show that SCRank-SVM is quite suitable to solve our problem. The performance of SCRank-SVM is superior to that of the original Rank-SVM and some other well-established multi-label learning algorithms.
Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Kim H, Chen S (2009) Associative naive Bayes classifier: automated linking of gene ontology to MEDLINE documents. Pattern Recogn 42:1777–1785
French L, Pavlidis P (2012) Using text mining to link journal articles to neuroanatomical databases. J Comp Neurol 520(8):1772–1783
Tsuruoka Y, Tsujii J, Ananiadou S (2008) FACTA: a text search engine for finding associated biomedical concepts. Bioinformatics 24(21):2559–2560
Plake C, Schiemann T, Pankalla M, Hakenberg J, Leser U (2006) AliBaba: PubMed as a graph. Bioinformatics 22(19):2444–2445
Doms A, Schroeder M (2005) GoPubMed: exploring PubMed with the gene ontology. Nucleic Acids Res 33:783–786
Zhang M-L, Zhou Z-H (2007) ML-kNN: a lazy learning approach to multi-label learning. Pattern Recogn 40:2038–2048
Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In: Dai H, Srikant R, Zhang C (eds) Lecture notes in artificial intelligence 3056. Springer, Berlin, pp 22–30
Zhang Y, Zhou Z-H (2008) Multi-label dimensionality reduction via dependency maximization. In: Proceedings of the 23rd AAAI conference on artificial intelligence, Chicago, IL, pp 1503–1505
Zhang M-L, Zhou Z-H (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng. http://doi.ieeecomputersociety.org/10.1109
Xu J (2014) Multi-label core vector machine with a zero label. Pattern Recogn. doi:10.1016/j.patcog.2014.01.012
Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recogn 37(9):1757–1771
Schapire RE, Singer Y (2000) Boostexter: a boosting-based system for text categorization. Mach Learn 39(2/3):135–168
Furnkranz J, Hullermeier E, Loza Mencía E, Brinker K (2008) Multilabel classification via calibrated label ranking. Machine Learning 73(2):133–153
Tsoumakas G, Vlahavas I (2007) Random k-label sets: an ensemble method for multilabel classification. In: Kok JN, Koronacki J, de Mantaras RL, Matwin S, Mladenic D, Skowron A (eds) Lecture notes in artificial intelligence 4701. Springer, Berlin, pp 406–417
Zhou Z-H, Zhang M-L (2007) Multi-instance multi-label learning with application to scene classification. In: Scholkopf B, Platt J, Hoffman T (eds) Advances in neural information processing systems 19. MIT Press, Cambridge, pp 1609–1616
Clare A, King RD (2001) Knowledge discovery in multi-label phenotype data. In: De Raedt L, Siebes A (eds) Lecture notes in computer science 2168. Springer, Berlin, pp 42–53
Elisseeff A, Weston J (2002) A kernel method for multi-labelled classification. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems 14. MIT Press, Cambridge, pp 681–687
Zhang M-L, Zhou Z-H (2006) Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans Knowl Data Eng 18(10):1338–1351
Fan R-E, Lin C-J (2007) A study on threshold selection for multi-label classification. National Taiwan University, Tech. Rep.
Ioannou M, Sakkas G, Tsoumakas G, Vlahavas I (2010) Obtaining bipartition from score vectors for multi-label classification. In: Proceedings of the 22nd IEEE international conference on tools with artificial intelligence, Arras, France, pp 409–416
Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recogn 37(9):1757–1771
Bi W, Kwok JT (2011) Multi-label classification on tree- and DAG-structured hierarchies. In: Proceedings of the 28th international conference on machine learning, Bellevue, WA, pp 17–24
Brinker K (2005) On active learning in multi-label classification. In: Proceedings of the 29th annual conference of the German Classification Society, Magdeburg, Germany, pp 206–213
Brinker K, Furnkranz J, Hullermeier E (2006) A unified model for multilabel classification and ranking. In: Proceedings of the 17th European conference on artificial intelligence, Riva del Garda, Italy, pp 489–493
Quevedo JR, Luaces O, Bahamonde A (2012) Multilabel classifiers with a probabilistic thresholding strategy. Pattern Recogn 45(2):876–883
Zhang M-L, Peña JM, Robles V (2009) Feature selection for multi-label naive Bayes classification. Inf Sci 179(19):3218–3229
Poggio T, Mukherjee S, Rifkin R, Rakhlin A, Verri A (2001) ‘‘b,’’ A.I. Memo No. 2001-011, CBCL Memo 198, Artificial Intelligence Laboratory, Massachusetts Institute of Technology
Chen B, Zhao S, Zhu P, Principe JC (2012) Quantized kernel least mean square algorithm. IEEE Trans Neural Netw Learn Syst 23(1):22–32
De Brabanter K, De Brabanter J, Suykens JAK, De Moor B (2010) Optimized fixed-size kernel models for large data sets. Comput Stat Data Anal 54(6):1484–1504
Chen B, Zhao S, Zhu P, Principe JC (2013) Quantized kernel recursive least squares algorithm. IEEE Trans Neural Netw Learn Syst 24(9):1484–1491
Guo Y, Schuurmans D (2011) Adaptive large margin training for multilabel classification. In: Proceedings of the 25th AAAI conference on artificial intelligence, San Francico, CA, pp 374–379
Jiang A, Wang C, Zhu Y (2008) Calibrated rank-svm for multi-label image categorization. In: Proceedings of the international joint conference on neural networks, Hong Kong, pp 1450–1455
Ji S, Sun L, Jin R, Ye J (2009) Multi-label multiple kernel learning. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems 21. MIT Press, Cambridge, pp 777–784
Elisseeff A, Weston J (2002) A kernel method for multi-labelled classification. Adv Neural Inf Process Syst 14:681–687
Xu J (2013) Fast multi-label core vector machine. Pattern Recogn 46(3):885–898
Xu J (2012) An efficient multi-label support vector machine with a zero label. Expert Syst Appl 39(5):4796–4804
Huang G-B, Xhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern 42(2):513–529
Ding X-J, Zhao Y-L (2011) Influence of bias b on generalization ability of SVM for classification. Acta Autom Sin 37(9):1105–1113
Acknowledgments
The authors wish to thank the anonymous reviewers for their helpful comments and suggestions. The author also thanks Prof. Zhihua Zhou, Mingling Zhang and Jianhua Xu, whose software and data have been used in our experiments. This work was supported by NSFc (Grant No. 61202184) and Natural Science Basic Research Plan in Shaanxi Province of China (No. 2015JQ6240).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sun, X., Wang, J., Feng, J. et al. Classifying biomedical knowledge in PubMed using multi-label vector machines with weaker optimization constraints. Neural Comput & Applic 28 (Suppl 1), 1233–1243 (2017). https://doi.org/10.1007/s00521-016-2439-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-016-2439-9