Abstract
As ASCII arts can be noise for natural language processing, ASCII art extraction methods can be used to remove them from text. A run-length encoding (RLE) based ASCII art extraction method proposed in our papers uses compression ratio by RLE for recognition of ASCII arts as ASCII arts tend to be compressed small by RLE and non-ASCII arts do not. It is because same characters tend to occur successively in ASCII arts but they do not in non-ASCII arts. Small ASCII arts, however, are not compressed as small as large ASCII arts. In this paper, we add the occurrence number of n-gram of ASCII arts in text into the RLE-based method as a new text attribute to cope with small ASCII arts. Our experimental results show that the new attribute improves the F-measure but it adds language-dependency into the RLE-based method though it is desirable that ASCII art extraction methods are language-independent.
Chapter PDF
References
Hiroki, T., Minoru, M.: Ascii Art Pattern Recognition using SVM based on Morphological Analysis. Technical report of IEICE. PRMU 104(670), 25–30 (20050218). http://ci.nii.ac.jp/naid/110003275719/
Nakazawa, M., Matsumoto, K., Yanagihara, T., Ikeda, K., Takishima, Y., Hoashi, K.: Proposal and its evaluation of ASCII-art extraction. In: Proceedings of the 2nd Forum on Data Engineering and Information Management (DEIM2010), pp. C9–C4 (2010)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Suzuki, T.: A comparison of whitespace normalization methods in a text art extraction method with run length encoding. In: Deng, H., Miao, D., Lei, J., Wang, F.L. (eds.) AICI 2011, Part III. LNCS, vol. 7004, pp. 135–142. Springer, Heidelberg (2011). http://dx.doi.org/10.1007/978-3-642-23896-3_16
Suzuki, T.: Comparison of two ASCII art extraction methods: a run-length encoding based method and a byte pattern based method. In: Proceedings of the 6th IASTED International Conference on Computational Intelligence. ACTA Press (2015)
Suzuki, T., Hayashi, K.: A language-independent text art extraction method. In: Proceedings of the 2nd International Conference on the Applications of Digital Information and Web Technologies, pp. 462–467. IEEE Computer Society (2009)
Suzuki, T., Hayashi, K.: Text data compression ratio as a text attribute for a language-independent text art extraction method. In: Proceedings of the 3rd International Conference on the Applications of Digital Information and Web Technologies (2010)
The University of Waikato: Weka 3 - Data Mining with Open Source Machine Learning Software in Java. http://www.cs.waikato.ac.nz/ml/weka/ (retrieved on December 14, 2008)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann (2005)
Xu, X., Zhang, L., Wong, T.T.: Structure-based ASCII Art. ACM Transactions on Graphics (SIGGRAPH 2010 issue) 29(4), 52:1–52:9 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Suzuki, T. (2015). Introduction of N-gram into a Run-Length Encoding Based ASCII Art Extraction Method. In: Daniel, F., Diaz, O. (eds) Current Trends in Web Engineering. ICWE 2015. Lecture Notes in Computer Science(), vol 9396. Springer, Cham. https://doi.org/10.1007/978-3-319-24800-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-24800-4_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24799-1
Online ISBN: 978-3-319-24800-4
eBook Packages: Computer ScienceComputer Science (R0)