Abstract
The word alphabet is connection to a lot of problems in the information retrieval. Information retrieval algorithms usually do not process the input data as sequence of bytes, but they use even bigger pieces of the data, say words or generally some chunks of the data. This is the main motivation of the paper. How to split the input data into smaller chunks without a priori known structure? To do this, we use Voting Experts Algorithms in our paper. Voting Experts Algorithm is often used to process time series data, audio signals, etc. Our intention is to use Voting Experts algorithm for future segmentation of discrete data such as DNA or proteins. For test purposes we use Czech and English text as test bed for the segmentation algorithm. We use Menzerath-Altmann law for comparison of the segmentation result.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Altmann, G.: Prolegomena to Menzerath’s law. Glottometrika 2, 1–10 (1980)
Arnold, R., Bell, T.: A Corpus for the Evaluation of Lossless Compression Algorithms. In: Proc. 1997 IEEE Data Compression Conference, pp. 201–210 (1997)
Cheng, J., Mitzenmacher, M.: Markov Experts. In: Proceedings of the Data Compression Conference, DCC (2005)
Cohen, P.R., Adams, N.: An Algorithm for Segmenting Categorical Time Series Into Meaningful Episodes. In: Hoffmann, F., Adams, N., Fisher, D., Guimarães, G., Hand, D.J. (eds.) IDA 2001. LNCS, vol. 2189, pp. 198–207. Springer, Heidelberg (2001)
Cohen, P.R., Adams, N., Heeringa, B.: Voting Experts: An Unsupervised Algorithm for Segmenting Sequences. To Appear in Journal of Intelligent Data Analysis (2007)
Hewlett, D., Cohen, P.: Bootstrap Voting Experts. In: Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence, IJCAI (2009)
Ishioka, T.: Evaluation of criteria on information retrieval. Systems and Computers in Japan 35(6), 42–49 (2004)
Miller, M., Wong, P., Stoytchev, A.: Unsupervised Segmentation of Audio Speech Using the Voting Experts Algorithm. In: Proceedings of the Second Conference on Artificial General Intelligence, AGI (2009)
Miller, M., Stoytchev, A.: Hierarchical Voting Experts: An Unsupervised Algorithm for Hierarchical Sequence Segmentation. In: Proceedings of the 7th IEEE International Conference on Development and Learning (ICDL) (Best Paper Award, ICDL 2008) (2008)
Muller, M.: Dynamic Time Warping. Information Retrieval for Music and Motion, pp. 69–84. Springer, Heidelberg (2007) ISBN 978-3-540-74047-6
Van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Department of Computer Science, University of Glasgow (1979)
Swartz, B.E., Goldensohn, E.S.: Electroencephalography and Clinical Neurophysiology. Electroencephalography and Clinical Neurophysiology 106(2), 173–176 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kocyan, T., Martinovič, J., Dvorský, J., Snášel, V. (2011). Czech Text Segmentation Using Voting Experts and Its Comparison with Menzerath-Altmann law. In: Chaki, N., Cortesi, A. (eds) Computer Information Systems – Analysis and Technologies. Communications in Computer and Information Science, vol 245. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27245-5_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-27245-5_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27244-8
Online ISBN: 978-3-642-27245-5
eBook Packages: Computer ScienceComputer Science (R0)