Text Similarity Computing Based on Standard Deviation

Liu, Tao; Guo, Jun

doi:10.1007/11538059_48

Text Similarity Computing Based on Standard Deviation

Tao Liu¹⁹ &
Jun Guo¹⁹

Conference paper

4320 Accesses
9 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3644))

Abstract

Automatic text categorization is defined as the task to assign free text documents to one or more predefined categories based on their content. Classical method for computing text similarity is to calculate the cosine value of angle between vectors. In order to improve the categorization performance, this paper puts forward a new algorithm to compute the text similarity based on standard deviation. Experiments on Chinese text documents show the validity and the feasibility of the standard deviation-based algorithm.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fuhr, N., Hartmanna, S., Lustig, G., Schwantner, M., Tzeras, K.: Air/x - a rule-based multistage indexing systems for large subject fields. In: Proceedings of RIAO 1991, pp. 606–623 (1991)
Google Scholar
Yang, Y., Chute, C.G.: A Linear Least Squares Fit mapping method for information retrieval from natural language texts. In: Proceedings of 14th International Conference on Computational Linguistics (COLING 1992), vol. II, pp. 447–453 (1992)
Google Scholar
Creecy, R.H., Masand, B.M., Smith, S.J., Waltz, D.L.: Trading MIPS and memory for knowledge engineering: classifying census returns on the connection machine. Comm. ACM 35, 48–63 (1992)
Article Google Scholar
Yang, Y., Chute, C.G.: An example-based mapping method for text classification and retrieval. ACM Transactions on Information Systems (TOIS) 12, 253–277 (1994)
Article Google Scholar
Tzeras, K., Hartmann, S.: Automatic Indexing Based on Bayesian Inference Networks. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIDIR 1993), pp. 22–34 (1993)
Google Scholar
Lewis, D., Ringuette, M.: A comparison of two learning algorithms for text clas sification. In: Third Annual Symposium on Document Analysis and Information Retrieval, pp. 81–93 (1994)
Google Scholar
Moulinier, I.: Is learning bias an issue on the text categorization problem? Technical report, LAFORIA-LIP6, Universite Paris VI (1997)
Google Scholar
Apte, C., Damerau, F., Weiss, S.: Towards language independent automated learning of text categorization models. In: Proceedings of the Seventeenth Annual International ACM/SIGIR Conference (1994)
Google Scholar
Wiener, E., Pedersen, J.O., Weigend, A.S.: A neural network approach to topic spotting. In: Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval(SDAIR 1995) (1995)
Google Scholar
Moulinier, I., Raskinis, G., Ganascia, J.: Text categorization: a symbolic approach. In: Proceedings of the Fifth Annual Symposium on Document Analysis and Information Retrieval (1996)
Google Scholar
William, W.C., Singer, Y.: Context-sensitive learning methods for text classification. In: SIGIR 1996: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 307–315 (1996)
Google Scholar
David, D.L., Robert, E.S., Callan, J.P., Papka, R.: Training Algorithms for Linear Text Classifiers. In: SIGIR 1996: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 298–306 (1996)
Google Scholar
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398. Springer, Heidelberg (1998)
Chapter Google Scholar
Rocchio, J.: Relevance feedback in information retrieval. In: The SMART Retrieval System: Experiments in Automatic Document Processing, pp. 313–323. Prentice Hall Inc., Englewood Cliffs (1971)
Google Scholar
Weiss, S.M., Apte, C., Damerau, F.J., Johnson, D.E., Oles, F.J., Goetz, T., Hampp, T.: Maximizing Text-Mining Performance. IEEE Intelligent Systems and Their Applications [see also IEEE Intelligent Systems] 14, 63–69 (1999)
Article Google Scholar
Salton, G., Lesk, M.E.: Computer evaluation of Indexing and text processing. Association for Computing Machinery 15, 8–36 (1968)
MATH Google Scholar
Salton, G., Wong, A., Yang, C.S.: A Vector Space Model for Automatic Indexing. Communications of ACM 18, 613–620 (1975)
Article MATH Google Scholar
Yiming, Y., Jan, P.P.: A comparative study on feature selection in text Categorization. In: Proceedings of ICML1997, 14th International Conference on Machine Learning, pp. 412–420. Morgan Kaufmann, San Francisco (1997)
Google Scholar
Tom, M.: Machine Learning. McGraw-Hill, New York (1996)
MATH Google Scholar
Quinlan, J.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Google Scholar
Keeneth, W.C., Patric, H.: Word association norms, mutual information and lexicography. In: Proceeding of ACL, Vancouver, Canada, vol. 27, pp. 76–83 (1989)
Google Scholar
Fano, R.: Transmission of Information. MIT Press, Cambrige (1961)
Google Scholar
Wiener, E., Pedersen, J.O., Weigend, A.S.: A neural network apporach to topic spotting. In: Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval(SDAIR 1995) (1995)
Google Scholar
Yiming, Y.: Noise Reduction in a Statistical Approach to Text Categorization. In: ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1995), pp. 256–263 (1995)
Google Scholar
Salton, G.: Automatic text processing: the transformation analysis and retrieval of information by Computer. Aoldison-wesley, Reading (1989)
Google Scholar
Bin, L., Tiejun, H., Jun, C., Wen, G.: A New Statistical-based Method in Automatic Text Classification. Journal of Chinese information processing 16, 18–24 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Engineering, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Tao Liu & Jun Guo

Authors

Tao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Guo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Intelligent Computing Lab, Institute of Intelligent Machines, Chinese Academy of Sciences,, China
De-Shuang Huang
School of Computer & Information Technology, Beijing Jiaotong University, 100044, Beijing, P.R. China
Xiao-Ping Zhang
School of Electrical and Electronic Engineering, Nanyang Technological University, P.O. Box, Singapore
Guang-Bin Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, T., Guo, J. (2005). Text Similarity Computing Based on Standard Deviation. In: Huang, DS., Zhang, XP., Huang, GB. (eds) Advances in Intelligent Computing. ICIC 2005. Lecture Notes in Computer Science, vol 3644. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11538059_48

Download citation

DOI: https://doi.org/10.1007/11538059_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28226-6
Online ISBN: 978-3-540-31902-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics