Skip to main content

Czech Text Segmentation Using Voting Experts and Its Comparison with Menzerath-Altmann law

  • Conference paper
  • 634 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 245))

Abstract

The word alphabet is connection to a lot of problems in the information retrieval. Information retrieval algorithms usually do not process the input data as sequence of bytes, but they use even bigger pieces of the data, say words or generally some chunks of the data. This is the main motivation of the paper. How to split the input data into smaller chunks without a priori known structure? To do this, we use Voting Experts Algorithms in our paper. Voting Experts Algorithm is often used to process time series data, audio signals, etc. Our intention is to use Voting Experts algorithm for future segmentation of discrete data such as DNA or proteins. For test purposes we use Czech and English text as test bed for the segmentation algorithm. We use Menzerath-Altmann law for comparison of the segmentation result.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altmann, G.: Prolegomena to Menzerath’s law. Glottometrika 2, 1–10 (1980)

    MathSciNet  Google Scholar 

  2. Arnold, R., Bell, T.: A Corpus for the Evaluation of Lossless Compression Algorithms. In: Proc. 1997 IEEE Data Compression Conference, pp. 201–210 (1997)

    Google Scholar 

  3. Cheng, J., Mitzenmacher, M.: Markov Experts. In: Proceedings of the Data Compression Conference, DCC (2005)

    Google Scholar 

  4. Cohen, P.R., Adams, N.: An Algorithm for Segmenting Categorical Time Series Into Meaningful Episodes. In: Hoffmann, F., Adams, N., Fisher, D., Guimarães, G., Hand, D.J. (eds.) IDA 2001. LNCS, vol. 2189, pp. 198–207. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  5. Cohen, P.R., Adams, N., Heeringa, B.: Voting Experts: An Unsupervised Algorithm for Segmenting Sequences. To Appear in Journal of Intelligent Data Analysis (2007)

    Google Scholar 

  6. Hewlett, D., Cohen, P.: Bootstrap Voting Experts. In: Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence, IJCAI (2009)

    Google Scholar 

  7. Ishioka, T.: Evaluation of criteria on information retrieval. Systems and Computers in Japan 35(6), 42–49 (2004)

    Article  Google Scholar 

  8. Miller, M., Wong, P., Stoytchev, A.: Unsupervised Segmentation of Audio Speech Using the Voting Experts Algorithm. In: Proceedings of the Second Conference on Artificial General Intelligence, AGI (2009)

    Google Scholar 

  9. Miller, M., Stoytchev, A.: Hierarchical Voting Experts: An Unsupervised Algorithm for Hierarchical Sequence Segmentation. In: Proceedings of the 7th IEEE International Conference on Development and Learning (ICDL) (Best Paper Award, ICDL 2008) (2008)

    Google Scholar 

  10. Muller, M.: Dynamic Time Warping. Information Retrieval for Music and Motion, pp. 69–84. Springer, Heidelberg (2007) ISBN 978-3-540-74047-6

    Google Scholar 

  11. Van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Department of Computer Science, University of Glasgow (1979)

    Google Scholar 

  12. Swartz, B.E., Goldensohn, E.S.: Electroencephalography and Clinical Neurophysiology. Electroencephalography and Clinical Neurophysiology 106(2), 173–176 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kocyan, T., Martinovič, J., Dvorský, J., Snášel, V. (2011). Czech Text Segmentation Using Voting Experts and Its Comparison with Menzerath-Altmann law. In: Chaki, N., Cortesi, A. (eds) Computer Information Systems – Analysis and Technologies. Communications in Computer and Information Science, vol 245. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27245-5_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27245-5_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-27244-8

  • Online ISBN: 978-3-642-27245-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics