Classifying biomedical knowledge in PubMed using multi-label vector machines with weaker optimization constraints

Sun, Xia; Wang, Jiarong; Feng, Jun; Chen, Su-Shing; He, Feijuan

doi:10.1007/s00521-016-2439-9

Classifying biomedical knowledge in PubMed using multi-label vector machines with weaker optimization constraints

Original Article
Published: 23 June 2016

Volume 28, pages 1233–1243, (2017)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Xia Sun^1,2,
Jiarong Wang¹,
Jun Feng¹,
Su-Shing Chen^2,3 &
…
Feijuan He⁴

575 Accesses
1 Citation
Explore all metrics

Abstract

In this paper, we have developed an automated multi-label linking scheme for PubMed citations with gene ontology (GO) terms, which enables users to have easy access to relevant publications according to various biomedical ontological terms (in particular, GO terms). We propose a maximum margin approach derived from ranking support vector machine (Rank-SVM), called SCRank-SVM. In this scheme, we remove the term bias “b” and recast the decision boundary and the separating margin to improve the margin of Rank-SVM. Due to the weaker optimization constraints, SCRank-SVM has better generalization performance and lower computational complexity. Experiments on our lung cancer data set and 6 diverse multi-label data sets show that SCRank-SVM is quite suitable to solve our problem. The performance of SCRank-SVM is superior to that of the original Rank-SVM and some other well-established multi-label learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Priberam at MESINESP Multi-label Classification of Medical Texts Task

Cost Sensitive Ranking Support Vector Machine for Multi-label Data Learning

Large scale biomedical texts classification: a kNN and an ESA-based approaches

Article Open access 16 June 2016

Khadim Dramé, Fleur Mougin & Gayo Diallo

Notes

[1] has surveyed the top 20 most frequently occurring GO terms in MEDLINE citations.
Limiting the results to the MeSH major topic field with lung cancer, which is different from the queries in Ref. [1], resulted in only 16 top GO terms to be found in MEDLINE citations.

References

Kim H, Chen S (2009) Associative naive Bayes classifier: automated linking of gene ontology to MEDLINE documents. Pattern Recogn 42:1777–1785
Article MATH Google Scholar
French L, Pavlidis P (2012) Using text mining to link journal articles to neuroanatomical databases. J Comp Neurol 520(8):1772–1783
Article Google Scholar
Tsuruoka Y, Tsujii J, Ananiadou S (2008) FACTA: a text search engine for finding associated biomedical concepts. Bioinformatics 24(21):2559–2560
Article Google Scholar
Plake C, Schiemann T, Pankalla M, Hakenberg J, Leser U (2006) AliBaba: PubMed as a graph. Bioinformatics 22(19):2444–2445
Article Google Scholar
Doms A, Schroeder M (2005) GoPubMed: exploring PubMed with the gene ontology. Nucleic Acids Res 33:783–786
Article Google Scholar
Zhang M-L, Zhou Z-H (2007) ML-kNN: a lazy learning approach to multi-label learning. Pattern Recogn 40:2038–2048
Article MATH Google Scholar
Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In: Dai H, Srikant R, Zhang C (eds) Lecture notes in artificial intelligence 3056. Springer, Berlin, pp 22–30
Google Scholar
Zhang Y, Zhou Z-H (2008) Multi-label dimensionality reduction via dependency maximization. In: Proceedings of the 23rd AAAI conference on artificial intelligence, Chicago, IL, pp 1503–1505
Zhang M-L, Zhou Z-H (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng. http://doi.ieeecomputersociety.org/10.1109
Xu J (2014) Multi-label core vector machine with a zero label. Pattern Recogn. doi:10.1016/j.patcog.2014.01.012
MATH Google Scholar
Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recogn 37(9):1757–1771
Article Google Scholar
Schapire RE, Singer Y (2000) Boostexter: a boosting-based system for text categorization. Mach Learn 39(2/3):135–168
Article MATH Google Scholar
Furnkranz J, Hullermeier E, Loza Mencía E, Brinker K (2008) Multilabel classification via calibrated label ranking. Machine Learning 73(2):133–153
Article Google Scholar
Tsoumakas G, Vlahavas I (2007) Random k-label sets: an ensemble method for multilabel classification. In: Kok JN, Koronacki J, de Mantaras RL, Matwin S, Mladenic D, Skowron A (eds) Lecture notes in artificial intelligence 4701. Springer, Berlin, pp 406–417
Google Scholar
Zhou Z-H, Zhang M-L (2007) Multi-instance multi-label learning with application to scene classification. In: Scholkopf B, Platt J, Hoffman T (eds) Advances in neural information processing systems 19. MIT Press, Cambridge, pp 1609–1616
Google Scholar
Clare A, King RD (2001) Knowledge discovery in multi-label phenotype data. In: De Raedt L, Siebes A (eds) Lecture notes in computer science 2168. Springer, Berlin, pp 42–53
Google Scholar
Elisseeff A, Weston J (2002) A kernel method for multi-labelled classification. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems 14. MIT Press, Cambridge, pp 681–687
Google Scholar
Zhang M-L, Zhou Z-H (2006) Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans Knowl Data Eng 18(10):1338–1351
Article Google Scholar
Fan R-E, Lin C-J (2007) A study on threshold selection for multi-label classification. National Taiwan University, Tech. Rep.
Ioannou M, Sakkas G, Tsoumakas G, Vlahavas I (2010) Obtaining bipartition from score vectors for multi-label classification. In: Proceedings of the 22nd IEEE international conference on tools with artificial intelligence, Arras, France, pp 409–416
Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recogn 37(9):1757–1771
Article Google Scholar
Bi W, Kwok JT (2011) Multi-label classification on tree- and DAG-structured hierarchies. In: Proceedings of the 28th international conference on machine learning, Bellevue, WA, pp 17–24
Brinker K (2005) On active learning in multi-label classification. In: Proceedings of the 29th annual conference of the German Classification Society, Magdeburg, Germany, pp 206–213
Brinker K, Furnkranz J, Hullermeier E (2006) A unified model for multilabel classification and ranking. In: Proceedings of the 17th European conference on artificial intelligence, Riva del Garda, Italy, pp 489–493
Quevedo JR, Luaces O, Bahamonde A (2012) Multilabel classifiers with a probabilistic thresholding strategy. Pattern Recogn 45(2):876–883
MATH Google Scholar
Zhang M-L, Peña JM, Robles V (2009) Feature selection for multi-label naive Bayes classification. Inf Sci 179(19):3218–3229
Article MATH Google Scholar
Poggio T, Mukherjee S, Rifkin R, Rakhlin A, Verri A (2001) ‘‘b,’’ A.I. Memo No. 2001-011, CBCL Memo 198, Artificial Intelligence Laboratory, Massachusetts Institute of Technology
Chen B, Zhao S, Zhu P, Principe JC (2012) Quantized kernel least mean square algorithm. IEEE Trans Neural Netw Learn Syst 23(1):22–32
Article Google Scholar
De Brabanter K, De Brabanter J, Suykens JAK, De Moor B (2010) Optimized fixed-size kernel models for large data sets. Comput Stat Data Anal 54(6):1484–1504
Article MathSciNet MATH Google Scholar
Chen B, Zhao S, Zhu P, Principe JC (2013) Quantized kernel recursive least squares algorithm. IEEE Trans Neural Netw Learn Syst 24(9):1484–1491
Article Google Scholar
Guo Y, Schuurmans D (2011) Adaptive large margin training for multilabel classification. In: Proceedings of the 25th AAAI conference on artificial intelligence, San Francico, CA, pp 374–379
Jiang A, Wang C, Zhu Y (2008) Calibrated rank-svm for multi-label image categorization. In: Proceedings of the international joint conference on neural networks, Hong Kong, pp 1450–1455
Ji S, Sun L, Jin R, Ye J (2009) Multi-label multiple kernel learning. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems 21. MIT Press, Cambridge, pp 777–784
Google Scholar
Elisseeff A, Weston J (2002) A kernel method for multi-labelled classification. Adv Neural Inf Process Syst 14:681–687
Google Scholar
Xu J (2013) Fast multi-label core vector machine. Pattern Recogn 46(3):885–898
Article MATH Google Scholar
Xu J (2012) An efficient multi-label support vector machine with a zero label. Expert Syst Appl 39(5):4796–4804
Article Google Scholar
Huang G-B, Xhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern 42(2):513–529
Article Google Scholar
Ding X-J, Zhao Y-L (2011) Influence of bias b on generalization ability of SVM for classification. Acta Autom Sin 37(9):1105–1113
MathSciNet Google Scholar

Download references

Acknowledgments

The authors wish to thank the anonymous reviewers for their helpful comments and suggestions. The author also thanks Prof. Zhihua Zhou, Mingling Zhang and Jianhua Xu, whose software and data have been used in our experiments. This work was supported by NSFc (Grant No. 61202184) and Natural Science Basic Research Plan in Shaanxi Province of China (No. 2015JQ6240).

Author information

Authors and Affiliations

School of Information Science and Technology, Northwest University, Xi’an, 710069, China
Xia Sun, Jiarong Wang & Jun Feng
Systems Biology Lab, University of Florida, Gainesville, FL, 32608, USA
Xia Sun & Su-Shing Chen
Computer Information Science and Engineering, University of Florida, Gainesville, FL, 32608, USA
Su-Shing Chen
Department of Computer Science, Xi’an Jiaotong University City College, Xi’an, 710069, China
Feijuan He

Authors

Xia Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jiarong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Feng
View author publications
You can also search for this author in PubMed Google Scholar
Su-Shing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Feijuan He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xia Sun.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, X., Wang, J., Feng, J. et al. Classifying biomedical knowledge in PubMed using multi-label vector machines with weaker optimization constraints. Neural Comput & Applic 28 (Suppl 1), 1233–1243 (2017). https://doi.org/10.1007/s00521-016-2439-9

Download citation

Received: 25 January 2015
Accepted: 14 June 2016
Published: 23 June 2016
Issue Date: December 2017
DOI: https://doi.org/10.1007/s00521-016-2439-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classifying biomedical knowledge in PubMed using multi-label vector machines with weaker optimization constraints

Abstract

Access this article

Similar content being viewed by others

Priberam at MESINESP Multi-label Classification of Medical Texts Task

Cost Sensitive Ranking Support Vector Machine for Multi-label Data Learning

Large scale biomedical texts classification: a kNN and an ESA-based approaches

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Priberam at MESINESP Multi-label Classification of Medical Texts Task

Cost Sensitive Ranking Support Vector Machine for Multi-label Data Learning

Large scale biomedical texts classification: a kNN and an ESA-based approaches

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation