Abstract
It is difficult to classify electronic documents into strategic technology and non-strategic technology. A document retrieval system to find out similar cases efficiently has been developed. However, its performance should be improved as more documents are accumulating on the system. In this paper, we will apply feature selection method based on chi square and try to improve the performance of system.
This work was supported by the Nuclear Safety Research Program through the Korea Foundation of Nuclear Safety (KOFONS), granted financial resource from the Nuclear Safety and Security Commission (NSSC), Republic of Korea (No. 1305014).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Reactor Coolant System, Reactor Vessel, Safety Injection and Shutdown Cooling System, Steam Generator, In-core Instrument, Control Element Assembly, Fuel Assembly, Chemical and volume control system, Control System, Component Cooling Water System, Turbine and Generator System, Feedwater System, Steam System, Condenser System, Air System, Diesel Generator System, Reactor Containment Building, Radwaste System, Instrument System, Plant Protection System, Drain System, Electric Power System, Fuel Handling and Transfer System, Other Systems.
References
Jae-woong, T., Choul-woong, S., Dong-hoon, S.: The role of text mining. Transactions of the Korean Nuclear Society Autumn Meeting (2015)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. Int. J. 24(5), 513–523 (1988)
Andrew Ng. CS229 lecture notes (2012). http://cs229.stanford.edu/notes/cs229-notes5.pdf
Quinlan, J.R.: Constructing Decision Tree in C4.5: Programs for Machine Learning, pp. 17–26. Morgan Kaufman Publishers (1993)
Feature selection for unbalanced class distribution and Naive Bayes
Schutze, H., Hull, D., Pedersen, J.: A comparison of classifiers and document representations for the routing problem. In: International ACM SIGIR Conference on Research (1995)
Moh’d, A.: Chi square feature extraction based SVMs Arabic language text categorization system. J. Comput. Sci. 3(6), 430–435 (2007)
Christine, L., Christophe, M., Mathias, G.: Entropy based feature selection for text categorization. In: ACM Symposium on Applied Computing, TaiChung, Taiwan, March 2011, pp. 924–928. ACM (2011)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the 10th European Conference on Machine Learning, pp. 137–142 (1998)
Azam, N.: Yao. J.: Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Syst. Appl. 39, 4760–4768 (2012)
Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)
Yang, Y., Pederson, O.J.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML 1997, pp. 412–420 (1997)
Dessi, N., Pes, B.: Similarity of feature selection methods: an empirical study across data intensive classification tasks. Expert Syst. Appl. 42, 4632–4642 (2015)
Mladenic, D., Grobelnik, M.: Feature selection for unbalanced class distribution and Naive Bayes. In: Proceedings of the Sixteenth International Conference on Machine Learning, ICML 1999, pp. 258–267 (1999)
Diener-West, M.: Use of the Chi-Square Statistic (2008)
Shang, W., Huang, H., Zhu, H., Lin, Y., Qu, Y., Wang, Z.: A novel feature selection algorithm for text categorization. Expert Syst. Appl. 33(1), 1–5 (2007)
Uysal, A., Gunal, S.: A novel probabilistic feature selection method for text classification. Knowl. Based Syst. 36, 226–235 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Tae, Jw., Yoon, Sh., Shin, Dh. (2018). Feature Selection for Document Retrieval in the Export Control Domain. In: Bi, Y., Kapoor, S., Bhatia, R. (eds) Proceedings of SAI Intelligent Systems Conference (IntelliSys) 2016. IntelliSys 2016. Lecture Notes in Networks and Systems, vol 16. Springer, Cham. https://doi.org/10.1007/978-3-319-56991-8_76
Download citation
DOI: https://doi.org/10.1007/978-3-319-56991-8_76
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56990-1
Online ISBN: 978-3-319-56991-8
eBook Packages: EngineeringEngineering (R0)