Machine Learning Architectures for Scalable and Reliable Subject Indexing

Toepfer, Martin

doi:10.1007/978-3-319-67008-9_61

Martin Toepfer¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10450))

Included in the following conference series:

International Conference on Theory and Practice of Digital Libraries

2408 Accesses

Abstract

Digital libraries desire automatic subject indexing as a scalable provider of high-quality semantic document representations. The task is, however, complex and challenging, thus many issues are still unsolved. For instance, certain concepts are not detected accurately, and confidence estimates are often unreliable. Accurate quality estimates are, however, crucial in practice, for example, to filter results and ensure highest standards before subsequent use. The proposed thesis studies applications of machine learning for automatic subject indexing, which faces considerable challenges like class imbalance, concept drift, and zero-shot learning. Special attention will be paid to architecture design and automatic quality estimation, with experiments on scholarly publications in economics and business studies. First results indicate the importance of knowledge transfer between concepts and point out the value of so-called fusion approaches that carefully combine lexical and associative subsystems. This extended abstract summarizes the main topic and status of the thesis and provides an outlook on future directions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
www.nlm.nih.gov/mesh/.
2.
id.loc.gov/authorities/subjects.html.
3.
www.dnb.de/gnd.
4.
www.zbw.eu/stw.
5.
www.tpdl.eu/tpdl2017/.
6.
For instance, due to experiments at the ZBW and correspondence with the German National Library at a recent workshop on “Computer-assisted Subject Cataloguing”, 2017 in Stuttgart, Germany.

References

Ferrucci, D.A., Brown, E.W., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A., Lally, A., Murdock, J.W., Nyberg, E., Prager, J.M., Schlaefer, N., Welty, C.A.: Building watson: an overview of the DeepQA project. AI Mag. 31(3), 59–79 (2010)
Article Google Scholar
Gibaja, E., Ventura, S.: A tutorial on multilabel learning. ACM Comput. Surv. 47(3), 52: 1–52: 38 (2015)
Article Google Scholar
Huang, M., Névéol, A., Lu, Z.: Recommending MeSH terms for annotating biomedical articles. JAMIA 18(5), 660–667 (2011)
Google Scholar
Jimeno-Yepes, A., Mork, J.G., Demner-Fushman, D., Aronson, A.R.: A one-size-fits-all indexing method does not exist: automatic selection based on meta-learning. JCSE 6(2), 151–160 (2012)
Google Scholar
Manning, C.D.: Computational linguistics and deep learning. Comput. Linguist. 41(4), 701–707 (2015)
Article MathSciNet Google Scholar
Medelyan, O., Witten, I.H.: Measuring inter-indexer consistency using a thesaurus. In: Proceedings of Joint Conference on Digital Libraries, pp. 274–275. ACM (2006)
Google Scholar
Medelyan, O., Witten, I.H.: Domain-independent automatic keyphrase indexing with small training sets. J. Am. Soc. Inf. Sci. Technol. 59(7), 1026–1040 (2008). http://dx.doi.org/10.1002/asi.20790
Article Google Scholar
Palatucci, M., Pomerleau, D., Hinton, G.E., Mitchell, T.M.: Zero-shot learning with semantic output codes. In: Bengio, Y., Schuurmans, D., Lafferty, J.D., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems 22, pp. 1410–1418. Curran Associates, Inc. (2009). http://papers.nips.cc/paper/3650-zero-shot-learning-with-semantic-output-codes.pdf
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Article Google Scholar
Toepfer, M., Seifert, C.: Descriptor-invariant fusion architectures for automatic subject indexing. In: Proceedings of Joint Conference on Digital Libraries (2017). Accepted
Google Scholar
Wilbur, W.J., Kim, W.: Stochastic gradient descent and the prediction of MeSH for PubMed records. Proc. AMIA Ann. Symp. 2014, 1198–1207 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

ZBW – Leibniz Information Centre for Economics, Düsternbrooker Weg 120, 24105, Kiel, Germany
Martin Toepfer

Authors

Martin Toepfer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Toepfer .

Editor information

Editors and Affiliations

Faculteit der Geesteswetenschappen, Universiteit van Amsterdam , Amsterdam, The Netherlands
Jaap Kamps
Library & Information Center, University of Patras , Patras, Greece
Giannis Tsakonas
Aristotle University of Thessaloniki , Thessaloniki, Greece
Yannis Manolopoulos
Civil Engineering, University of Thrace , Kimmeria, Greece
Lazaros Iliadis
Informatics, Ionian University , Kerkyra, Greece
Ioannis Karydis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Toepfer, M. (2017). Machine Learning Architectures for Scalable and Reliable Subject Indexing. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science(), vol 10450. Springer, Cham. https://doi.org/10.1007/978-3-319-67008-9_61

Download citation

DOI: https://doi.org/10.1007/978-3-319-67008-9_61
Published: 02 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67007-2
Online ISBN: 978-3-319-67008-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics