Adaptation of Algorithms for Medical Information Retrieval for Working on Russian-Language Text Content

Vatian, Aleksandra; Dobrenko, Natalia; Makarenko, Anastasia; Nigmatullin, Niyaz; Vedernikov, Nikolay; Vasilev, Artem; Stankevich, Andrey; Gusarova, Natalia; Shalyto, Anatoly

doi:10.1007/978-3-030-00794-2_11

Aleksandra Vatian¹⁹,
Natalia Dobrenko¹⁹,
Anastasia Makarenko¹⁹,
Niyaz Nigmatullin¹⁹,
Nikolay Vedernikov¹⁹,
Artem Vasilev¹⁹,
Andrey Stankevich¹⁹,
Natalia Gusarova¹⁹ &
…
Anatoly Shalyto¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11107))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1491 Accesses

Abstract

The paper investigates the possibilities of adapting various ADR algorithms to the Russian language environment. In general, the ADR detection process consists of 4 steps: (1) data collection from social media; (2) classification/filtering of ADR assertive text segments; (3) extraction of ADR mentions from text segments; (4) analysis of extracted ADR mentions for signal generation. The implementation of each step in the Russian-language environment is associated with a number of difficulties in comparison with the traditional English-speaking environment. First of all, they are connected with the lack of necessary databases and specialized language resources. In addition, an important negative role is played by the complex grammatical structure of the Russian language. The authors present various methods of machine learning algorithms adaptation in order to overcome these difficulties. For step 3 on the material of Russian-language text forums using the ensemble classifier, the Accuracy = 0.805 was obtained. For step 4 on the material of Russian-language EHR, by adapting pyConTextNLP, the F-measure = 0.935 was obtained, and by adapting ConText algorithm, the F-measure = 0.92–0.95 was obtained. A method for full-scale performing of step 4 was developed using cue-based and rule-based approaches, and the F-measure = 67.5% was obtained that is quite comparable to baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A robust classification approach to enhance clinic identification from Arabic health text

Article 23 February 2024

Development of a Machine Learning Framework for Biomedical Text Mining

NLP Semi-supervised PU Learning with Reduced Number of Labeled Examples

Notes

References

Afzal, Z., Pons, E., Kang, N., Sturkenboom, M.C., Schuemie, M.J., Kors, J.A.: ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus. BMC Bioinform. 15(1), 373 (2014)
Article Google Scholar
Allahyari, M., et al.: Text summarization techniques: a brief survey. arXiv preprint arXiv:1707.02268 (2017)
Baranov, A., et al.: Technologies for complex intelligent clinical data analysis. Vestnik Rossiiskoi akademii meditsinskikh nauk 2, 160–171 (2016)
Article Google Scholar
Bhatia, N., Jaiswal, A.: Automatic text summarization and it’s methods - a review. In: 2016 6th International Conference on Cloud System and Big Data Engineering, Confluence, pp. 65–72. IEEE (2016)
Google Scholar
Gildeeva, G., Yurkov, V.: Pharmacovigilance in Russia: challenges, prospects and current state of affairs. J. Pharmacovigil. (2016)
Google Scholar
Gonzalez, G.H., Tahsin, T., Goodale, B.C., Greene, A.C., Greene, C.S.: Recent advances and emerging applications in text and data mining for biomedical discovery. Brief. Bioinform. 17(1), 33–42 (2015)
Article Google Scholar
Grozin, V., Buraya, K., Gusarova, N.: Comparison of text forum summarization depending on query type for text forums. In: Soh, P.J., Woo, W.L., Sulaiman, H.A., Othman, M.A., Saat, M.S. (eds.) Advances in Machine Learning and Signal Processing. LNEE, vol. 387, pp. 269–279. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32213-1_24
Chapter Google Scholar
Lapaev, M.: Automated extraction of concept matcher thesaurus from semi-structured catalogue-like sources of data on the web. In: 2016 18th Conference of Open Innovations Association and Seminar on Information Security and Protection of Information Technology, FRUCT-ISPIT, pp. 153–160. IEEE (2016)
Google Scholar
Liu, X., Chen, H.: A research framework for pharmacovigilance in health social media: identification and evaluation of patient adverse drug event reports. J. Biomed. Inform. 58, 268–279 (2015)
Article Google Scholar
Lushnov, M., Kudashov, V., Vodyaho, A., Lapaev, M., Zhukova, N., Korobov, D.: Medical knowledge representation for evaluation of patient’s state using complex indicators. In: Ngonga Ngomo, A.-C., Křemen, P. (eds.) KESW 2016. CCIS, vol. 649, pp. 344–359. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45880-9_26
Chapter Google Scholar
Sarker, A., Gonzalez, G.: Portable automatic text classification for adverse drug reaction detection via multi-corpus training. J. Biomed. Inform. 53, 196–207 (2015)
Article Google Scholar
Shelmanov, A., Smirnov, I., Vishneva, E.: Information extraction from clinical texts in Russian. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference, Dialogue, vol. 14, pp. 537–549 (2015)
Google Scholar
Velupillai, S., et al.: Cue-based assertion classification for Swedish clinical text—Developing a lexicon for pyConTextSwe. Artif. Intell. Med. 61(3), 137–144 (2014)
Article Google Scholar

Download references

Acknowledgment

This work was financially supported by the Government of Russian Federation, “Grant 08-08”. This work financially supported by Ministry of Education and Science of the Russian Federation, Agreement #14.578.21.0196 (03/10/2016). Unique Identification RFMEFI57816X0196.

Author information

Authors and Affiliations

ITMO University, 49 Kronverkskiy prosp., 197101, Saint-Petersburg, Russia
Aleksandra Vatian, Natalia Dobrenko, Anastasia Makarenko, Niyaz Nigmatullin, Nikolay Vedernikov, Artem Vasilev, Andrey Stankevich, Natalia Gusarova & Anatoly Shalyto

Authors

Aleksandra Vatian
View author publications
You can also search for this author in PubMed Google Scholar
Natalia Dobrenko
View author publications
You can also search for this author in PubMed Google Scholar
Anastasia Makarenko
View author publications
You can also search for this author in PubMed Google Scholar
Niyaz Nigmatullin
View author publications
You can also search for this author in PubMed Google Scholar
Nikolay Vedernikov
View author publications
You can also search for this author in PubMed Google Scholar
Artem Vasilev
View author publications
You can also search for this author in PubMed Google Scholar
Andrey Stankevich
View author publications
You can also search for this author in PubMed Google Scholar
Natalia Gusarova
View author publications
You can also search for this author in PubMed Google Scholar
Anatoly Shalyto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aleksandra Vatian .

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Aleš Horák
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vatian, A. et al. (2018). Adaptation of Algorithms for Medical Information Retrieval for Working on Russian-Language Text Content. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2018. Lecture Notes in Computer Science(), vol 11107. Springer, Cham. https://doi.org/10.1007/978-3-030-00794-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-00794-2_11
Published: 08 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00793-5
Online ISBN: 978-3-030-00794-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics