Skip to main content

Automated Indexing with Restricted Random Walks on Large Document Sets

  • Conference paper
Research and Advanced Technology for Digital Libraries (ECDL 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3232))

Included in the following conference series:

Abstract

We propose a method based on restricted random walk clustering as a (semi-)automated complement for the tedious, error-prone and expensive task of manual indexing in a scientific library. The first stage of our method is to cluster a set of (partially) indexed documents using restricted random walks on usage histories in order to find groups of similar documents. In the second stage, we derive possible keywords for documents without indexing information from the frequencies of keywords assigned to other documents in their respective cluster.

Due to the specific clustering algorithm, the proposed algorithm is still efficient with millions of documents and can be deployed on standard PC hardware.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ehrenberg, A.S.: Repeat-Buying: Facts, Theory and Applications, 2nd edn. Charles Griffin & Company Ltd., London (1988)

    Google Scholar 

  2. Geyer-Schulz, A., Neumann, A., Thede, A.: Others also use: a robust recommender system for scientific libraries. In: Koch, T., Sølvberg, I.T. (eds.) ECDL 2003. LNCS, vol. 2769, pp. 113–125. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  3. Geyer-Schulz, A., Neumann, A., Thede, A.: An architecture for behavior-based library recommender systems – integration and first experiences. Information Technology and Libraries 22 (2003)

    Google Scholar 

  4. Yang, Y.: An evaluation of statistical approaches to text categorization. Information Retrieval 1, 69–90 (1999)

    Article  Google Scholar 

  5. Creecy, R.H., Masand, B.M., Smith, S.J., Waltz, D.L.: Trading mips and memory for knowledge engineering. Communications of the ACM 35, 48–64 (1992)

    Article  Google Scholar 

  6. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002)

    Article  MathSciNet  Google Scholar 

  7. Chung, Y.M., Pottenger, W.M., Schatz, B.R.: Automatic subject indexing using an associative neural network. In: Proceedings of the 3rd ACM International Conference on Digital Libraries, pp. 59–68. ACM Press, New York (1998)

    Chapter  Google Scholar 

  8. Lauser, B., Hotho, A.: Automatic multi-label subject indexing in a multilingual environment. In: Koch, T., Sølvberg, I.T. (eds.) ECDL 2003. LNCS, vol. 2769, pp. 140–151. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  9. Semeraro, G., Ferilli, S., Fanizzi, N., Esposito, F.: Document classification and interpretation through the inference of logic-based models. In: Constantopoulos, P., Sølvberg, I.T. (eds.) ECDL 2001. LNCS, vol. 2163, pp. 59–70. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  10. Bock, H.: Automatische Klassifikation. Vandenhoeck&Ruprecht, Göttingen (1974)

    Google Scholar 

  11. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley- Interscience, New York (2001)

    MATH  Google Scholar 

  12. Hartigan, J.A.: Clustering Algorithms. John Wiley and Sons, New York (1975)

    MATH  Google Scholar 

  13. Viegener, J.: Inkrementelle, domänenunabhängige Thesauruserstellung in dokumentbasierten Informationssystemen durch Kombination von Konstruktionsverfahren. 1 edn. infix, Sankt Augustin (1997)

    Google Scholar 

  14. Schöll, J., Paschinger, E.: Cluster Analysis with Restricted Random Walks. In: Jajuga, K., Sokolowski, A., Bock, H.H. (eds.) Classification, Clustering, and Data Analysis, pp. 113–120. Springer, Heidelberg (2002)

    Google Scholar 

  15. Franke, M.: Clustering of very large document sets using random walks. Master’s thesis, Universität Karlsruhe (TH), Karlsruhe (2003)

    Google Scholar 

  16. Erdös, P., Renyi, A.: On random graphs I. Publ. Mathematicae 6, 290–297 (1957)

    Google Scholar 

  17. Kunz, M., et al.: SWD Sachgruppen. Technical report, Deutsche Bibliothek (2003)

    Google Scholar 

  18. Die Deutsche Bibliothek: MAB2 : Maschinelles Austauschformat für Bibliotheken. Dt. Bibliothek, Leipzig (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Franke, M., Geyer-Schulz, A. (2004). Automated Indexing with Restricted Random Walks on Large Document Sets. In: Heery, R., Lyon, L. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2004. Lecture Notes in Computer Science, vol 3232. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30230-8_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30230-8_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23013-7

  • Online ISBN: 978-3-540-30230-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics