Abstract
Our research focuses on text document mining with an application to engineering diagnostics. In automotive industry, the auto problem descriptions are often used as the first step of a diagnostic process that map the problem descriptions to diagnostic categories such as engine, transmission, electrical, brake, etc. This mapping of problem description to diagnostic categories is currently being done manually by mechanics that perform this task largely based on their memory and experience, which usually lead to lengthy repair processes, less accurate diagnostics and unnecessary part replacement. This paper presents our research in applying text mining technology to the automatic mapping of problem descriptions to the correct diagnostic categories. We present our results through the study on a number of important issues relating to text document classification including term weighting schemes, LSA and similarity functions. A text document categorization system is presented and it has been tested on a large test data collected from auto dealers. Its system performance is very satisfactory.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wong, S.K.M., Raghavan, V.V.: A Critical Analysis of Vector Space Model for Information Retrieval. Journal of the Americal Society for Information Science 37(2), 279–287 (1986)
Salton, G.a.B.C.: Term weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)
Dumais, S.T.: Enhancing performance in latent semantic indexing (LSI) retrieval, in Technical Report Technical Memorandum, Bellcore (1990)
Berry, M.W., Dumais, S.T., O’Brien, G.W.: Using linear algebra for intelligent information retrieval. SIAM Review 37(4), 573–595 (1995)
Deerwester, S.S.D., Furnas, G., Landauer, T., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science 41, 391–407 (1990)
Landauer, T.K., Laham, D.: Introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)
Berry, M.W., Jessup, E.R.: Matrices, Vector Spaces, and Information Retrieval. SIAM Review 41(2), 335–362 (1999)
Edwards, A.L.: The Correlation Coefficient. W.H. Freeman, San Francisco (1976)
Lehmann, E.L., D’Abrera, H.J.M.: Nonparametrics: Statistical Methods Based on Ranks. Prentice-Hall, Englewood Cliffs (1998)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Cardoso-Cachopo, A., Oliveira, A.L.: An Empirical Comparison of Text Categorization Methods. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 183–196. Springer, Heidelberg (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Huang, L., Murphey, Y.L. (2006). Text Mining with Application to Engineering Diagnostics. In: Ali, M., Dapoigny, R. (eds) Advances in Applied Artificial Intelligence. IEA/AIE 2006. Lecture Notes in Computer Science(), vol 4031. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11779568_138
Download citation
DOI: https://doi.org/10.1007/11779568_138
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35453-6
Online ISBN: 978-3-540-35454-3
eBook Packages: Computer ScienceComputer Science (R0)