Abstract
This paper presents machine learning approaches based on supervised methods applied to triage of health and biomedical data. We discuss the applications of such approaches in three different tasks, and evaluate the usage of triage pipelines, as well as data sampling and feature selection methods to improve performance on each task. The scientific data triage systems are based on a generic and light pipeline, and yet flexible enough to perform triage on distinct data. The presented approaches were developed to be integrated as a part of web-based systems, providing real time feedback to health and biomedical professionals. All systems are publicly available as open-source.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Almeida, H., Meurs, M.-J.: Automatic triage of mental health online forum posts - NAACL-CLPsych 2016 system description. Red 110(11.61), 27 (2016)
Almeida, H., Meurs, M.-J., Kosseim, L., Butler, G., Tsang, A.: Machine learning for biomedical literature triage. PLOS ONE 9(12), e115892 (2014)
Almeida, H., Meurs, M.-J., Kosseim, L., Tsang, A.: Data sampling and supervised learning for HIV literature screening. IEEE Trans. NanoBiosci. 15(4), 354–361 (2016)
Basu, T., Murthy, C.: Effective text classification by a supervised feature selection approach. In: Proceedings of the IEEE 12th International Conference on Data Mining Workshops (ICDMW), December 10, Brussels, Belgium, pp. 918–925. IEEE (2012)
Bekhuis, T., Demner-Fushman, D.: Screening nonrandomized studies for medical systematic reviews: a comparative study of classifiers. Artif. Intell. Med. 55(3), 197–207 (2012)
Chahinian, V., Meurs, M.-J., Mason, D.H., McDonnell, E., Morgenstern, I., Butler, G., Tsang, A.: Proxiris, an augmented browsing tool for literature curation. In: Proceedings of 9th International Conference on Data Integration in the Life Sciences, DILS 2013. CEUR, July 2013
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 341–378 (2002)
Holzinger, A., Jurisica, I.: Knowledge discovery and data mining in biomedical informatics: the future is in integrative, interactive machine learning solutions. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 1–18. Springer, Heidelberg (2014). doi:10.1007/978-3-662-43968-5_1
Howe, D., Costanzo, M., Fey, P., Gojobori, T., Hannick, L., Hide, W., Hill, D.P., Kania, R., Schaeffer, M., St Pierre, S., Twigger, S., White, O., Yon Rhee, S.: Big data: the future of biocuration. Nature 455(7209), 47–50 (2008)
Kölling, J., Langenkämper, D., Abouna, S., Khan, M., Nattkemper, T.W.: WHIDE - a web tool for visual data mining colocation patterns in multivariate bioimages. Bioinformatics 28(8), 1143–1150 (2012)
Liu, H., Motoda, H., Setiono, R., Zhao, Z., Selection, F.: An ever evolving frontier in data mining. In: Proceedings of the 4th Workshop on Feature Selection in Data Mining, June 21, Hyderabad, India, pp. 4–13 (2010)
Lu, Z.: PubMed and beyond: a survey of web tools for searching biomedical literature. Database 2011, baq036 (2011)
Meurs, M.-J., Murphy, C., Morgenstern, I., Butler, G., Powlowski, J., Tsang, A., Witte, R.: Semantic text mining support for lignocellulose research. BMC Med. Inf. Decis. Making 12(1), S5 (2012)
Moorhead, S.A., Hazlett, D.E., Harrison, L., Carroll, J.K., Irwin, A., Hoving, C.: A new dimension of health care: systematic review of the uses, benefits, and limitations of social media for health communication. J. Med. Internet Res. 15(4), e85 (2013)
Murdoch, T.B., Detsky, A.S.: The inevitable application of big data to health care. JAMA J. Am. Med. Assoc. 309(13), 1351–1352 (2013)
O’Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M., Ananiadou, S.: Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst. Rev. 4(1), 5 (2015)
Palaniappan, S., Awang, R.: Intelligent heart disease prediction system using data mining techniques. In: IEEE/ACS International Conference on Computer Systems and Applications, 2008, pp. 108–115. IEEE (2008)
Saleem, S., Prasad, R., Vitaladevuni, S.N.P., Pacula, M., Crystal, M., Marx, B., Sloan, D., Vasterling, J., Speroff, T.: Automatic detection of psychological distress indicators and severity assessment from online forum posts. In: The International Conference on Computational Linguistics, COLING, pp. 2375–2388 (2012)
Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment strength detection for the social web. J. Am. Soc. Inf. Sci. Technol. 63(1), 163–173 (2012)
Tuarob, S., Tucker, C.S., Salathe, M., Ram, N.: An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages. J. Biomed. Inf. 49, 255–268 (2014)
Wang, M., Zhang, W., Ding, W., Dai, D., Zhang, H., Xie, H., Chen, L., Guo, Y., Xie, J.: Parallel clustering algorithm for large-scale biological data sets. PLOS ONE 9(4), e91315 (2014)
Weiss, G.M., McCarthy, K., Zabar, B.: Cost-sensitive learning vs. sampling: which is best for handling unbalanced classes with unequal error costs? In: DMIN-International Conference on Data Mining, pp. 35–41 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Almeida, H., Queudot, M., Kosseim, L., Meurs, MJ. (2017). Supervised Methods to Support Online Scientific Data Triage. In: Aïmeur, E., Ruhi, U., Weiss, M. (eds) E-Technologies: Embracing the Internet of Things . MCETECH 2017. Lecture Notes in Business Information Processing, vol 289. Springer, Cham. https://doi.org/10.1007/978-3-319-59041-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-59041-7_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59040-0
Online ISBN: 978-3-319-59041-7
eBook Packages: Computer ScienceComputer Science (R0)