Skip to main content

Document Versioning Using Feature Space Distances

  • Conference paper
  • 2361 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8835))

Abstract

The automated analysis of documents is an important task given the rapid increase in availability of digital texts. In an earlier publication, we had presented a framework where the edit distances between documents was used to reconstruct the version history of a set of documents. However, one problem which we encountered was the high computational costs of calculating these edit distances. In addition, the number of document comparisons which need to be done scales quadratically with the number of documents. In this paper we propose a simple approximation which retains many of the benefits of the method, but which greatly reduces the time required to calculate these edit distances. To test the utility of this method, the accuracy of the results obtained using this approximation is compared to the original results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Woon, W.L., Wong, K.-S.: String alignment for automated document versioning. Knowledge and Information Systems (2008)

    Google Scholar 

  2. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8), 707–710 (1966)

    MathSciNet  Google Scholar 

  3. Soukoreff, W.R., Mackenzie, S.I.: Measuring errors in text entry tasks: an application of the levenshtein string distance statistic. In: CHI 2001: CHI 2001 Extended Abstracts on Human Factors in Computing Systems, pp. 319–320. ACM Press, New York (2001)

    Google Scholar 

  4. Lodhi, H., Taylor, J.S., Cristianini, N., Watkins, C.J.C.H.: Text classification using string kernels. In: Advances in Neural Information Processing Systems (NIPS), pp. 563–569 (2000)

    Google Scholar 

  5. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., WatkinsText, C.: classification using string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)

    MATH  Google Scholar 

  6. Cancedda, N., Gaussier, E., Goutte, C., Renders, J.M.: Word sequence kernels. J. Mach. Learn. Res. 3, 1059–1082 (2003)

    MathSciNet  MATH  Google Scholar 

  7. Cristianini, N., Taylor, S.: An introduction to support vector machines. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  8. Mercer, J.: Functions of positive and negative type, and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character 209, 415–446 (1909)

    Article  MATH  Google Scholar 

  9. Aradhye, H., Dorai, C.: New kernels for analyzing multimodal data in multimedia using kernel machines. In: Proceedings of 2002 IEEE International Conference on Multimedia and Expo, ICME 2002, vol. 2, pp. 37–40 (2002)

    Google Scholar 

  10. Lan, M., Tan, C.-L., Low, H.-B., Sung, S.-Y.: A comprehensive comparative study on term weighting schemes for text categorization with support vector machines. In: WWW 2005: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, pp. 1032–1033. ACM Press, New York (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Woon, W.L., Wong, KS.D., Aung, Z., Svetinovic, D. (2014). Document Versioning Using Feature Space Distances. In: Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A., Huang, K. (eds) Neural Information Processing. ICONIP 2014. Lecture Notes in Computer Science, vol 8835. Springer, Cham. https://doi.org/10.1007/978-3-319-12640-1_59

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12640-1_59

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12639-5

  • Online ISBN: 978-3-319-12640-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics