Abstract
This paper describes development of the automated industry and occupation coding system for the Korean Census records. The purpose of the system is to convert natural language responses on survey questionnaires into corresponding numeric codes according to standard code book from the Census Bureau. We employ kNN(k Nearest Neighbors)-based document classification method and information retrieval techniques to index and to weight index terms. In order to solve the description inconsistency of many respondents, we use nouns and phrases acquired from past census data. Using the data, we could estimate the nouns or phrases frequently used to describe a certain code. The Experimental results show that the past census data plays an important role in increasing code classification accuracy.
This Work was Supported by the Korea Ministry of Science & Technology (M10413000008-04N1300-00811).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Apeel, M.V., Hellerman, E.: Census Bureau Experiments with Automated Industry and Occupation Coding. In: Proceedings of the American Statistical Association, pp. 32–40 (1983)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Chen, B., Creecy, R.H., Appel, M.: On Error Control of Automated Industry and Occupation Coding. Journal of official Statistics 9(4), 729–745 (1993)
Creecy, R.H., Masand, B.M., Smith, S.J., Walts, D.L.: Trading MIPS and Memory for Knowledge Engineering. Communications of the ACM 35(8), 48–64 (1992)
Gilman, D.W., Appel, M.V.: Automated Coding Research At the Census Bureau. U.S. Census Bureau, http://www.census.gov/srd/papers/pdf/rr94-4.pdf
Korean Standard Industrial Classification. Korean Statistical Office (2000)
Korean Standard Classification of Occupations. Korean Statistical Office (2000)
Lee, D.G.: A High Speed Index Term Extracting System Considering the Morphological Configuration of Noun. M.S. Thesis, Dept. of Computer Science and Engineering, Korea Univ., Korea (2000)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Rowe, E., Wong, C.: An Introduction to the ACRT Coding System. Bureau of the Census Statistical Research Report Series No. RR94/02 (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lim, H.S., Kim, H. (2004). Automated Classification of Industry and Occupation Codes Using Document Classification Method. In: Pal, N.R., Kasabov, N., Mudi, R.K., Pal, S., Parui, S.K. (eds) Neural Information Processing. ICONIP 2004. Lecture Notes in Computer Science, vol 3316. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30499-9_127
Download citation
DOI: https://doi.org/10.1007/978-3-540-30499-9_127
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23931-4
Online ISBN: 978-3-540-30499-9
eBook Packages: Springer Book Archive