Skip to main content

A Software Architecture for Effective Document Identifier Reassignment

  • Conference paper
  • 1227 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3643))

Abstract

This works presents a software solution for enhancing inverted file compression based on the reassignment of document identifiers. We introduce different techniques recently presented in the Information Retrieval forums to address this problem. We give further details on how it is possible to perform the reassignment efficiently by applying a dimensionality reduction to the original inverted file and on the evaluation results obtained with this technique. This paper is devoted to the software architecture and design practises taken into account for this particular task. Here, we show that making use of design patterns and reusing software components leads to better research applications for Information Retrieval.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bartell, B.T., Cottrel, G.W., Belew, R.K.: Latent Semantic Indexing is an optimal special case of Multidimensional Scaling. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 161–167 (1992)

    Google Scholar 

  2. Blandford, D., Blelloch, G.: Index compression through document reordering. In: Proceedings of the IEEE Data Compression Conference (DCC 2002), pp. 342–351 (2002)

    Google Scholar 

  3. http://www.cs.mu.oz.au/mg/ ManagingGigabytes

  4. http://mg4j.dsi.unimi.it/ MG4J (Managing Gigabytes for Java)

  5. http://tedlab.mit.edu/~dr/SVDLIBC/ SVDLIBC

  6. Moffat, A., Turpin, A.: Compression and Coding Algorithms. Kluwer, Dordrecht (2002)

    Google Scholar 

  7. Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes - Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann Publishing, San Francisco (1999)

    Google Scholar 

  8. Shieh, W.-Y., Chen, T.-F., Shann, J.J.-J., Chung, C.-P.: Inverted file compression through document identifier reassignment. Information Processing and Management 39(1), 117–131 (2003)

    Article  MATH  Google Scholar 

  9. Silvestri, F., Orlando, S., Perego, R.: Assigning identifiers to documents to enhance the clustering property of fulltext indexes. In: Proceeding of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 305–312 (2004)

    Google Scholar 

  10. Blanco, R., Barreiro, A.: Document identifier reassignment through dimensionality reduction. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 375–387. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  11. Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for patitioning irregular graphs. Technical Report TR 95-035 (1995)

    Google Scholar 

  12. Gamma, E., Heml, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of Reusable Object-Oriented Software. Addison Wesley, Reading (1995)

    Google Scholar 

  13. Rivest, R.: RFC 1321: The md5 algorithm

    Google Scholar 

  14. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  15. Berry, M.: Large Scale Singular Value Computations. International Journal of Supercomputer Applications 6(1), 13–49 (1992)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Blanco, R., Barreiro, Á. (2005). A Software Architecture for Effective Document Identifier Reassignment. In: Moreno Díaz, R., Pichler, F., Quesada Arencibia, A. (eds) Computer Aided Systems Theory – EUROCAST 2005. EUROCAST 2005. Lecture Notes in Computer Science, vol 3643. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11556985_34

Download citation

  • DOI: https://doi.org/10.1007/11556985_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29002-5

  • Online ISBN: 978-3-540-31829-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics