Skip to main content

Supervised Methods to Support Online Scientific Data Triage

  • Conference paper
  • First Online:
  • 938 Accesses

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 289))

Abstract

This paper presents machine learning approaches based on supervised methods applied to triage of health and biomedical data. We discuss the applications of such approaches in three different tasks, and evaluate the usage of triage pipelines, as well as data sampling and feature selection methods to improve performance on each task. The scientific data triage systems are based on a generic and light pipeline, and yet flexible enough to perform triage on distinct data. The presented approaches were developed to be integrated as a part of web-based systems, providing real time feedback to health and biomedical professionals. All systems are publicly available as open-source.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.hivevidence.ca.

  2. 2.

    https://www.ncbi.nlm.nih.gov/mesh.

  3. 3.

    http://clpsych.org/shared-task-2016/.

  4. 4.

    http://nlp.stanford.edu/software/tagger.shtml.

  5. 5.

    http://au.reachout.com/.

References

  1. Almeida, H., Meurs, M.-J.: Automatic triage of mental health online forum posts - NAACL-CLPsych 2016 system description. Red 110(11.61), 27 (2016)

    Google Scholar 

  2. Almeida, H., Meurs, M.-J., Kosseim, L., Butler, G., Tsang, A.: Machine learning for biomedical literature triage. PLOS ONE 9(12), e115892 (2014)

    Article  Google Scholar 

  3. Almeida, H., Meurs, M.-J., Kosseim, L., Tsang, A.: Data sampling and supervised learning for HIV literature screening. IEEE Trans. NanoBiosci. 15(4), 354–361 (2016)

    Article  Google Scholar 

  4. Basu, T., Murthy, C.: Effective text classification by a supervised feature selection approach. In: Proceedings of the IEEE 12th International Conference on Data Mining Workshops (ICDMW), December 10, Brussels, Belgium, pp. 918–925. IEEE (2012)

    Google Scholar 

  5. Bekhuis, T., Demner-Fushman, D.: Screening nonrandomized studies for medical systematic reviews: a comparative study of classifiers. Artif. Intell. Med. 55(3), 197–207 (2012)

    Article  Google Scholar 

  6. Chahinian, V., Meurs, M.-J., Mason, D.H., McDonnell, E., Morgenstern, I., Butler, G., Tsang, A.: Proxiris, an augmented browsing tool for literature curation. In: Proceedings of 9th International Conference on Data Integration in the Life Sciences, DILS 2013. CEUR, July 2013

    Google Scholar 

  7. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 341–378 (2002)

    MATH  Google Scholar 

  8. Holzinger, A., Jurisica, I.: Knowledge discovery and data mining in biomedical informatics: the future is in integrative, interactive machine learning solutions. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 1–18. Springer, Heidelberg (2014). doi:10.1007/978-3-662-43968-5_1

    Chapter  Google Scholar 

  9. Howe, D., Costanzo, M., Fey, P., Gojobori, T., Hannick, L., Hide, W., Hill, D.P., Kania, R., Schaeffer, M., St Pierre, S., Twigger, S., White, O., Yon Rhee, S.: Big data: the future of biocuration. Nature 455(7209), 47–50 (2008)

    Article  Google Scholar 

  10. Kölling, J., Langenkämper, D., Abouna, S., Khan, M., Nattkemper, T.W.: WHIDE - a web tool for visual data mining colocation patterns in multivariate bioimages. Bioinformatics 28(8), 1143–1150 (2012)

    Article  Google Scholar 

  11. Liu, H., Motoda, H., Setiono, R., Zhao, Z., Selection, F.: An ever evolving frontier in data mining. In: Proceedings of the 4th Workshop on Feature Selection in Data Mining, June 21, Hyderabad, India, pp. 4–13 (2010)

    Google Scholar 

  12. Lu, Z.: PubMed and beyond: a survey of web tools for searching biomedical literature. Database 2011, baq036 (2011)

    Article  Google Scholar 

  13. Meurs, M.-J., Murphy, C., Morgenstern, I., Butler, G., Powlowski, J., Tsang, A., Witte, R.: Semantic text mining support for lignocellulose research. BMC Med. Inf. Decis. Making 12(1), S5 (2012)

    Article  Google Scholar 

  14. Moorhead, S.A., Hazlett, D.E., Harrison, L., Carroll, J.K., Irwin, A., Hoving, C.: A new dimension of health care: systematic review of the uses, benefits, and limitations of social media for health communication. J. Med. Internet Res. 15(4), e85 (2013)

    Article  Google Scholar 

  15. Murdoch, T.B., Detsky, A.S.: The inevitable application of big data to health care. JAMA J. Am. Med. Assoc. 309(13), 1351–1352 (2013)

    Article  Google Scholar 

  16. O’Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M., Ananiadou, S.: Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst. Rev. 4(1), 5 (2015)

    Article  Google Scholar 

  17. Palaniappan, S., Awang, R.: Intelligent heart disease prediction system using data mining techniques. In: IEEE/ACS International Conference on Computer Systems and Applications, 2008, pp. 108–115. IEEE (2008)

    Google Scholar 

  18. Saleem, S., Prasad, R., Vitaladevuni, S.N.P., Pacula, M., Crystal, M., Marx, B., Sloan, D., Vasterling, J., Speroff, T.: Automatic detection of psychological distress indicators and severity assessment from online forum posts. In: The International Conference on Computational Linguistics, COLING, pp. 2375–2388 (2012)

    Google Scholar 

  19. Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment strength detection for the social web. J. Am. Soc. Inf. Sci. Technol. 63(1), 163–173 (2012)

    Article  Google Scholar 

  20. Tuarob, S., Tucker, C.S., Salathe, M., Ram, N.: An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages. J. Biomed. Inf. 49, 255–268 (2014)

    Article  Google Scholar 

  21. Wang, M., Zhang, W., Ding, W., Dai, D., Zhang, H., Xie, H., Chen, L., Guo, Y., Xie, J.: Parallel clustering algorithm for large-scale biological data sets. PLOS ONE 9(4), e91315 (2014)

    Article  Google Scholar 

  22. Weiss, G.M., McCarthy, K., Zabar, B.: Cost-sensitive learning vs. sampling: which is best for handling unbalanced classes with unequal error costs? In: DMIN-International Conference on Data Mining, pp. 35–41 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marie-Jean Meurs .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Almeida, H., Queudot, M., Kosseim, L., Meurs, MJ. (2017). Supervised Methods to Support Online Scientific Data Triage. In: Aïmeur, E., Ruhi, U., Weiss, M. (eds) E-Technologies: Embracing the Internet of Things . MCETECH 2017. Lecture Notes in Business Information Processing, vol 289. Springer, Cham. https://doi.org/10.1007/978-3-319-59041-7_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59041-7_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59040-0

  • Online ISBN: 978-3-319-59041-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics